Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Programming

Ask Slashdot: Changing Career From OLTP To OLAP Dev 129

First time accepted submitter xby2_arch writes "After spending over 12 years writing OLTP applications (Java EE/JDBC/ORMs), I decided to dabble in the OLAP world. I had decent DB skills, considering most of my previous projects had involved data modeling and coding using Stored Procs, etc. Yet I hadn't designed or implemented any dimensional databases. Luckily for me, I had enough relevant domain knowledge to land a developer job in a data warehousing project. The work was enjoyable enough that it motivated me to spend that extra time and effort I needed to cope with the different dynamics of coding in the OLAP realm. In my past life, data volumes weren't the primary concern (instead, transaction volumes were), here, everything was about data. ETL/Integrations present another set of problems you generally skirt in a typical web/app-tier developer role. All in all, it turned out to be a non-trivial, yet worthwhile transition. I am certain that there are plenty of seasoned developers out there who plan to make a similar move (or have made already), who see data as the next chapter in their careers evolving toward becoming Enterprise Architects. I want to hear what's holding them back, or what helped them move forward. What should be considered a prerequisite to make this switch, and what are the risks, etc.?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Changing Career From OLTP To OLAP Dev

Comments Filter:
  • Re:Define, please? (Score:5, Informative)

    by betterunixthanunix ( 980855 ) on Sunday January 15, 2012 @04:10PM (#38707512)
    You know, we can always check Wikipedia for definitions:

    https://en.wikipedia.org/wiki/OLTP [wikipedia.org]
    https://en.wikipedia.org/wiki/OLAP [wikipedia.org]

    It might have been nice if the editors had included these links in the summary, but it is not as though you do not know how to use Wikipedia.
  • Re:Define, please? (Score:5, Informative)

    by Koen Lefever ( 2543028 ) on Sunday January 15, 2012 @04:32PM (#38707694)
    OLTP: Online Transaction Processing: you buy a ticket, you want it immediatly. The seller types it in the computer and prints your ticket, a database checked if there were free seats and immediatly reserves one. "Immediatly" is the important word, the customer is not waiting. OLAP: Online Analytical Processing: How many seats from the US to EMEA have been sold by that kind of sellers with such produyt code. Managements wants the results by the end of the month, it is OK if the query runs a couple of hours/days. Many real life systems contain two databases, one tuned for speed (containing only the current tickets) and one for reference later (containing all tickets sold in the past n months/years). The difference between them is database tuning and SQL tuning. All the rest (such as path to architect: yeah, the more different systems you know, the more chance you will have to design new ones) is hype.
  • Re:Define, please? (Score:4, Informative)

    by UnknowingFool ( 672806 ) on Sunday January 15, 2012 @04:52PM (#38707820)

    The difference between them is database tuning and SQL tuning. All the rest (such as path to architect: yeah, the more different systems you know, the more chance you will have to design new ones) is hype.

    That's like saying web coding is the same as application coding and the main difference is just languages used. There is a huge difference in the goals and methodology of OLTP and OLAP. DB and SQL tuning is a small part of the larger scheme of things. Take for instance your example of tickets; you can get an OLTP to show how many seats from US to EMEA by seller. It may not be very easy especially as you add conditions (by quarter, by day of week of travelled, by day of week sold, etc). In OLAP these questions become important and you have to design your DB structure to accommodate them and other queries. It also matters how this information is reported by tool as not all tools can take advantage of DB features. After all that then DB and SQL tuning comes into play.

  • by Dishwasha ( 125561 ) on Sunday January 15, 2012 @04:52PM (#38707828)

    The prerequisites to making the switch is first and most importantly having an appropriate business case for OLAP. The second prerequisite is that you've tried doing analytics in a traditional RDMS, perhaps jumped on to the NoSQL bandwagon, and you've failed at it (i.e. success for a little while but then your data eventually brings your queries down to its knees). Don't worry, failure isn't necessarily wrong, it's just you and your team needed the experience before you could make the next leap.

    The risks are a knowledge jump in to an OLAP mindset from a traditional SQL mindset. Invest in you and your fellow developer's knowledge. Push back on management and sales when they want more immediate results and let them know that it will take 3-5 months to replace your current system. Do your proper technology evaluations. Learn FoodMart [microsoft.com] and Adventureworks [codeplex.com] and let them guide you down the path of good fact and dimension design. Don't snub your nose at Microsoft as they absorbed the company in the 80's that basically pioneered this stuff and made billions, but also don't take their stuff too literally as there are several products out there and some that do things better.

    Read The Data Warehouse Toolkit [amazon.com] thoroughly and practice using Mondrian [pentaho.com] which is an open source Java OLAP engine that can sit on top of PostgreSQL, MySQL, and others. Find a good ETL tool rather than trying to write your own at first and don't be afraid to force your internal users to use this tool to create their facts. Don't worry if you don't get it the first time, but keep trying and keep discussing with your fellow developers as it takes a team to work out all the kinks. Later on you'll probably end up seeing how you did things wrong, but hopefully you can get most things right in the beginning.

  • Re:Define, please? (Score:5, Informative)

    by Anonymous Coward on Sunday January 15, 2012 @05:58PM (#38708300)

    If it's on the main page the target audience is pretty general, so you really shouldn't have to check Wikipedia. You should ALWAYS tell people what acronyms mean before using them exclusively. And if you have time enough to be an arrogant *&$# while posting links to Wikipedia you could also take the time to spell out what the acronyms mean. At the very least give the long form, e.g. Online Transaction Processing (OLTP) [wikipedia.org], Online Analytical Processing (OLAP) [wikipedia.org]. See it's not that fucking hard.

    And since I am taking the time to rant (and because any technical article in Wikipedia gets hijacked by propellor heads who like to inject as much as of their industry specific double speak so that they sound important and a layman can't get at least an understandable overview in the summary):

    OLTP - Online Transaction Processing. An application/database designed for larger, equal, or nearly equal volumes of database inserts, updates, and deletes, as there are reads. The database is generally more (and often highly) normalized, meaning that there is less data duplication across tables and/or within tables.
    OLAP - Online Analytical Processing. Essentially a data warehouse designed for larger volumes of reads than there will be inserts, updates, or deletes (often relatively very, very few deletes). Less normalized meaning that there may be duplicated data across and/or within tables in order to increase query speed at the cost of possible consistency issues due to the data duplication.

  • Re:Define, please? (Score:4, Informative)

    by kurthr ( 30155 ) on Sunday January 15, 2012 @06:08PM (#38708406)

    OLTP - Online Transaction Processing. An application/database designed for larger, equal, or nearly equal volumes of database inserts, updates, and deletes, as there are reads. The database is generally more (and often highly) normalized, meaning that there is less data duplication across tables and/or within tables.
    OLAP - Online Analytical Processing. Essentially a data warehouse designed for larger volumes of reads than there will be inserts, updates, or deletes (often relatively very, very few deletes). Less normalized meaning that there may be duplicated data across and/or within tables in order to increase query speed at the cost of possible consistency issues due to the data duplication.

    Mark this informative comment the hell up!
    And smash my Karma if you want to, but the (four letter) acronyms really didn't help the article much.

  • Re:Define, please? (Score:4, Informative)

    by bWareiWare.co.uk ( 660144 ) on Monday January 16, 2012 @09:06AM (#38712510) Homepage

    A=Analytical: the problems of creating useful reports from vast amounts of mainly read-only data.
    T=Transactional: the problems of highly concurrent read-write transactions.
    Wikipedia simply states that the terms are derived from each other, the techniques they embody are very different.
    If you only have a small amount of data and a small number of transactions then you don't care about the difference, but once you have a large number of one or other (or worse both) then it is a very significant distinction.

  • by Anonymous Coward on Monday January 16, 2012 @12:35PM (#38714442)

    Development is development. There are differences, but its still development. I am a DBA and a database developer. I have worked on very large OLAP and OLTP systems. Here are the main differences. My experience is primarily with oracle.

    1. You need to really learn SQL well to handle OLAP well. You often get very complex requirements. Processing large amounts of records always performs exponentially better using straight sql if possible. You want to use as little procedure code as possible. Don't write loops, use java, shell, or procedure database language.

    2. you need to dig into the DB's features. In particular, features for processing large numbers of records. For example, analytic functions are critical. They are somewhat confusing if you have not used them. They are part of the ANSI 1999 standard. Most people do not know them. I think most SQL databases have them. Each database will have an OLAP guide. It is not just about using the cool buzzword features. You have to dig into regular things. Understanding table partitioning is useful.

    3. you need to test this stuff out. The docs are not always 100% correct.

    4. There are alot of things you need to do to improve performance that will hurt performance in an OLTP. For example with Oracle, you may end up using parallel processing and alot of create table as nologging. Both of which do not scale with large numbers of users. OLAPs typically have less users, but larger batch processes. These are invaluable in OLAP processing. For example, if you have a very complex requirement, you may do a series of create table as statements to get to the report you want. This allows you to break up the logic so its simpler to follow, it is also fast. You can't do this with OLTP, since this hurts performance with alot of users. Create table as statements also allow you to break down very complex requirements into simpler chunks. I often do create table as (and many of them) to chunk my steps. Then when I figure out how to answer the requirement, I combine and simplify what I did.

    5. you have more freedom to do more outside the box coding approaches. As stated above, since you have less users, you can do alot of stuff that doesn't work in volume. Google around. Alot of these things are on the web. However, you need to often combine them to come up with better approaches.

    6. ETL... this means dumping, loading, and changing around large volumes of data between databases. This is large batch processing. This follows what I said above. Avoid using procedural logic as much as possible. Try to use as much straight sql as possible. Running a loop that calls sql for each row is the absolute slowest way to do this by alot. However, that is how most developers write their ETL because that is all they know how to do. You can look like a genius, by taking this code and getting rid of loops and sub loops. Once you do it a few times its easy, but you will look smart.

    7. if you use an ETL tool, make sure you understand what it does underneath. figure out how it is generating the sql and what the code all looks like.

    8. if you use the approach that java developers use (they are basically the only ones who do this) that the database is a data dump and then talk in mumbo jumbo and do all processing in java. your code will suck. someone like me will come in and do what I said to do in step 6. Ill look like a genius. you will look stupid. done that lots of times... the hardest part is getting past the java guys mumbo jumbo. IF your a python, C, .net, or any other language developer you will have an easier time doing OLAP because your not as caught up in the silly cliches I hear from java developers.

    9. let me add one more time. This is a mistake virtually everyone makes. Loop based procedural logic is the absolute slowest as absolute worst way to do OLAP, ETL. SQL is written in C. So it is optimized for array processing under the covers. If you use loop based logic, you eliminate that. Sometimes you have to, but avoid it as much as pos

E = MC ** 2 +- 3db

Working...