Horizontal Scaling of SQL Databases?

Please create an account to participate in the Slashdot moderation system

Horizontal Scaling of SQL Databases? 222

Posted by timothy on Thursday November 18, 2010 @04:48PM from the side-to-side dept.

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"

This discussion has been archived. No new comments can be posted.

Horizontal Scaling of SQL Databases?

Load All Comments

Search 222 Comments Log In/Create an Account

Comments Filter:

XML (Score:2, Funny)

by Anonymous Coward writes:

Just store everything in a big XML file.
- Re:XML (Score:5, Funny)
  
  by Anonymous Coward writes: on Thursday November 18, 2010 @04:53PM (#34273496)
  
  XXXML
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by suomynonAyletamitlU ( 1618513 ) writes:
    
    "Extra-extra-extra-medium" large? I'm not sure how you get "extra-medium" in the first place, much less even moreso.
  - Re: (Score:2)
    
    by MichaelSmith ( 789609 ) writes:
    
    Make it four X [xxxx.com.au] and you've got a deal.
- Have you tried Perl? (Score:2)
  
  by goombah99 ( 560566 ) writes:
  
  Perl seems to work well for me. You may want to try it.
- Rick Cattell's work on scalable datastores (Score:5, Informative)
  
  by MoxFulder ( 159829 ) writes: on Thursday November 18, 2010 @09:19PM (#34277306) Homepage
  
  I recently came across Rick Cattell's site [cattell.net] which addresses just the questions you're asking.
  Rick Cattell has written an excellent comparison guide of horizontally scalable datastores [cattell.net] of different types (RDBMS as well as a variety of NoSQL systems).
  Cattell has also written an academic paper with database expert Mike Stonebraker, which weighs the system design factors required to make a datastore scalable [cattell.net].
  Executive summary of Cattell's work: although NoSQL may be a huge fad, the things that make a datastore scalable can be implemented in SQL RDBMS systems as well. Also, implementing do-it-yourself ACID in NoSQL systems is extremely difficult and error-prone, and is a significant advantage of most RDBMS systems. Stonebraker is the author of VoltDB, which is an open-source RDBMS designed for horizontal scalability, but they give a very fair and thorough look at competing datastores as well.
  
  Parent Share
  twitter facebook
- - Re: (Score:2)
    
    by JustOK ( 667959 ) writes:
    
    If it was really good, it would create itself, if it hasn't already.
What limitations are you running into? (Score:5, Insightful)

by Anonymous Coward writes: on Thursday November 18, 2010 @04:55PM (#34273524)

It would be a lot easier to talk about solutions if you said which limitations you run into.
Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

Share
twitter facebook
- Re:What limitations are you running into? (Score:5, Interesting)
  
  by Anonymous Coward writes: on Thursday November 18, 2010 @05:04PM (#34273660)
  
  It would be a lot easier to talk about solutions if you said which limitations you run into.
  Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?
  My money is on "No one here likes SQL" and "There aren't any exports on RDBMs to help us get things set up properly".
  
  Parent Share
  twitter facebook
  - Re:What limitations are you running into? (Score:5, Insightful)
    
    by DarkOx ( 621550 ) writes: on Thursday November 18, 2010 @06:11PM (#34274878) Journal
    
    I would have to agree, its really hard to imagine a "start up" can't make anything work on traditional SQL RDBMS(es). If you put the right hardware underneath it even SQL Server 2000 (64bit anyway) will scale just fine to terabyte size databases at thousands of transactions per second. That is not on impossible hardware for a successful start to buy either, we are talking a dedicated storage controller with gigabyte or so cache and few dozen SAS drives. I know I have worked on such projects.
    You need the schema right, and if its more reads than writes you might even de-normalize a little and you will need to partition the data appropriately, but it can be done. This is why realDBAs still make the big bucks. There is a lot to know in that domain. You probably should hire someone who is an expert on whatever stuff you are using now to consult before you go down the path of NOSQL. All you told us is you are a growing start up with is not much to go on but without know what you are doing its hard for me to believe you are doing anything on a scale that can't be done well with a relational database; but maybe I am wrong and maybe you are doing something huge. Remember as soon as you go down the NOSQL path you are going to have to be doing a great deal of heavy lifting because the quantity of libraries and off the shelf stuff out there is not great.
    
    Parent Share
    twitter facebook
  - Re: (Score:2)
    
    by Skal Tura ( 595728 ) writes:
    
    Which is most likely scenario as SQL scales tremendously given some thought to it.
Relational stuff scales (Score:5, Insightful)

by Anonymous Coward writes: on Thursday November 18, 2010 @05:00PM (#34273586)

Learn partitioning principles, get a database product that does partitioning properly, learn normalization, never worry again about not being able to scale with relational databases. It just requires some real skills but relational databases really do scale all the way up.

Share
twitter facebook
- Re: (Score:2, Interesting)
  
  by ani23 ( 899493 ) writes:
  
  Partitioning does complicate backups and HA/DR scenarios as the entire system is dependent on all machines being up and running. Also in most commercial db's (I know about db2) this feature takes you to the enterprise tier of software which is usually very expensive.
  - Re:Relational stuff scales (Score:5, Informative)
    
    by h4rr4r ( 612664 ) writes: on Thursday November 18, 2010 @05:13PM (#34273782)
    
    Postgres seems to not charge extra for that.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by TooMuchToDo ( 882796 ) writes:
      
      OH SNAP
      - Re: (Score:2)
        
        by Steeltoe ( 98226 ) writes:
        
        Ditto on Postgres :-)
        And if someone's having performance problems on Postgres, learn:
        A) Indexes
        B) CLUSTER
        C) RTFM
        Really, that's all there is to it! I'm sure more advanced setups can be made, but Postgres will scale fine for a small startup just using the basics. However, if you never CLUSTER or VACUUM (not preferable) Postgres, it can become a dog if you have alot of UPDATEs.
        Basically, the poster should just RTFM. It is time spent educating yourself, and making it better next time.. Asking Slashdot a generic
    - Re: (Score:2)
      
      by Splab ( 574204 ) writes:
      
      Postgres doesn't have clustering, so how exactly are you achiving this?
      The new "hot" standby option for postgres is a step in the right direction, but most can't live with "eventually consistent" in their hot standby environmen.t
- Re: (Score:3, Insightful)
  
  by atomic777 ( 860023 ) writes:
  
  Right. "We are hitting limitations in what we can do with X" means they cannot solve an underlying difficult problem Z, and are hoping that by swapping X with magic fairy dust Y, that somehow Z will go away. Sales people owe their BMWs to this simple fallacy.
- Re:Relational stuff scales - not around the world! (Score:3, Interesting)
  
  by mikehoskins ( 177074 ) writes:
  
  Can you shard the same SQL data store in Chicago, London, and Tokyo? Not with standard SQL databases, unless you write your own complicated replication techniques or pay through the nose. (See CAP Theorem).
  Yes, the company I work for has expressed the world-wide SQL database need, so this is not just a thought experiment.
  Have you heard of GemFire/GemStone, VoltDB, or Xeround?
  If you can get rid of the SQL requirement, try
  XML (or other format) on Amazon's S3
  or try one of the No
Consider scaling via other layers? (Score:4, Interesting)

by mlts ( 1038732 ) * writes: on Thursday November 18, 2010 @05:01PM (#34273608)

Another idea is to scale using other layers, if there are problems at the SQL server level.
At the lower areas, one can go with a mainframe (parallel sysplex) and have geographically separate pieces of hardware acting coherently.
At the higher layers, have the app use multiple SQL servers and handle the redundancy in this layer.

Share
twitter facebook
- - Re: (Score:2)
    
    by mlts ( 1038732 ) * writes:
    
    That is what mainframes are for. Yes, the technology is old and not exciting, but one of the strong points of mainframes is I/O, which is critical to most database architectures.
Call me skeptical (Score:5, Insightful)

by Kjella ( 173770 ) writes: on Thursday November 18, 2010 @05:05PM (#34273674) Homepage

Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong. As long as the volume is small you can make almost anything happen on SQL. Hell, most small business I've known run mostly on Excel. Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...

Share
twitter facebook
- Re:Call me skeptical (Score:5, Funny)
  
  by Squeebee ( 719115 ) writes: <{squeebee} {at} {gmail.com}> on Thursday November 18, 2010 @05:16PM (#34273828)
  
  Agreed, we have massive sites serving millions of requests a day using Open Source relational databases and yet it seems everyone wants to use NoSQL because it's the hip new thing.
  Naturally I start thinking of this: http://xtranormal.com/watch/6995033 [xtranormal.com]
  
  Parent Share
  twitter facebook
  - Re: (Score:2)
    
    by suso ( 153703 ) * writes:
    
    Naturally I start thinking of this: http://xtranormal.com/watch/6995033 [xtranormal.com]
    Thank you for posting that. I'm so sick of the NoSQL shit. Learn to design schemas.
    - Re: (Score:2)
      
      by nschubach ( 922175 ) writes:
      
      Are you sick of the NoSQL talk because you know specialize in SQL and feel as if it's a competitor, because it's gained a lot of attention recently and happens to be talked about more than SQL, or is there some other reason for the sick feeling?
      (I do very light SQL development and have not touched a "NoSQL" solution, but I do not find myself sickened by people investigating alternatives.)
      - Re: (Score:2)
        
        by gfody ( 514448 ) writes:
        
        It's a sickening display of ignorance coming from people who are supposed to be professionals. Nobody takes issue with people investigating alternatives to SQL but SQL has come under heavy fire by NoSQL proponents and yes one can become very sick of hearing the same old fallacious arguments again and again.
        
        Re: (Score:2)
        
        by cratermoon ( 765155 ) writes:
        
        Could you be specific about which fallacious arguments you have in mind? Preferably, cite 3 different fallacies with multiple sources for each one.
      - Re:Call me skeptical (Score:5, Informative)
        
        by vadim_t ( 324782 ) writes: on Thursday November 18, 2010 @06:25PM (#34275154) Homepage
        
        A lot of people don't understand how a database really works, so they do it horribly wrong. As a result, it's dreadfully slow. So they go and use some key/value lookup system because "they're fast". There you often get one of two things:
        They still don't understand the problem, so they recreate it yet again. If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side, then replacing the SQL DB with a key/value system just isn't going to make things better.
        Or, they improve performance, but since they don't understand what ACID is for, they eventually end up with weird inconsistencies. In some cases this might be acceptable, but you really don't want to see it happening in an order tracking system.
        The sickening feeling people get is not because it's a competitor. In a large part it isn't a competitor, but a different class of system with different tradeoffs. The sickening feeling comes from seeing people not understand what they're doing, and then run towards the latest technology because it's what $BIG_COMPANY uses without understanding it any better, and generally making an even bigger mess.
        The performance of specialized solutions like key/value systems doesn't come from magic. They're not really new, and don't use anything very groundbreaking. They simply use different tradeoffs at the cost of sacrificing quite a lot of what is present in a RDBMS. It's important to understand first whether you can really afford to discard those things, because if you can't, it's either not going to work right, or you'll have to graft all that you removed on top of it anyway.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Skal Tura ( 595728 ) writes:
        
        Yup, someone simply made up a buzzword it caught.
        "NoSQL" or key/value datastore is good for caching data. but i don't see much value beyond that if there's any relations.
        Thing is, most people are ignorant and clueless. Probably 90% of code i see (and i see it a lot) i discard as bad quality. Ironically, those with most buzzwords in marketing about code quality seems to have the lowest code quality in practical terms, and most idiotic software architecture. And yes, i do mean Magento.
        Magento is a modern marv
        
        Argh-stories (Score:2)
        
        by Steeltoe ( 98226 ) writes:
        
        Heh, just wait until you try SugarCRM.. Reading your post made me realize there are other projects out there with the _exact_ same flaws / annoyances as Sugar, love it or hate it.
        I'm sure Sugar is nice if you have the 6-12 months pouring over all the code and making design for new modules and layout / DB. But for _efficiency_ it's a mirage in the desert of hopeless "open source" projects, which in reality are paywalled spagghetti monsters.
        Same with Typo3.
        Don't get me wrong. It CAN work. I've been on success
        
        Re: (Score:2)
        
        by Klinky ( 636952 ) writes:
        
        ...and your hatred of Magento has what to do with NoSQL?
        
        Re:Call me skeptical (Score:4, Insightful)
        
        by Klinky ( 636952 ) writes: on Thursday November 18, 2010 @09:57PM (#34277562)
        
        NoSQL is not just key-value lookups. Take a look at Redis or MongoDB, there are novel ideas in both of them & yes they do bring new things to the table. They are NOT memcache. I am also not sure people are "sacrificing a lot of what is present in an RDBMS" by choosing NoSQL over an RDBMS. I think your gripe is with people who don't know what the hell they're doing, but you project that griping on to NoSQL in general. There are some things that RDBMSs are really good at, there are some things RDBMSs aren't so great at. The huge majority of people in the NoSQL communities and the users of these solutions know that loading a million objects client-side and discarding all but 5 is stupid & no one would suggest that is a failing of either RDBMS or NoSQL solutions, but squarely on the user.
        I would have to say that NoSQL is more relatable to how people think about objects and their relation to each other. People don't easily boil their objects down into relational tables and how each of those tables should interact with each other. This takes skill & talent & can be a bit of a pain to dive into which is why we have a bunch of ORM solutions which add another layer of cruft on top of RDBMSs. NoSQL is basically getting rid of the ORM & the tables(though some still use table-like structures). For apps that use that would normally use ORM(a lot of web apps) extensively that's great. For newbies who don't have years and years under their belt designing, tweaking normalization, sharding/partitioning it can be easier to pickup. Some of the NoSQL solutions have clustering/horizontal scaling and/or replication built-in, with no or very little schema/query changes required.
        So for some NoSQL will be a better solution and for others a RDBMS will be a better solution. I wouldn't knock either. Just because you can do something in an RDBMS doesn't mean it's better than a NoSQL solution & visa versa.
        
        Parent Share
        twitter facebook
        
        Re: (Score:2)
        
        by Bodrius ( 191265 ) writes:
        
        This is all true, but ignores the fact that for a lot of applications and teams RDBMS were overkill in the first place, so they are hardly sacrificing anything by switching to NoSQL.
        It's the same reason a lot of people in the early dot-com days believed MySQL was awesome precisely because it was such a crappy RDBMS ('who needs transactions or referential integrity anyway? it just slows things down')... arguably with robust simpler storage now there is more awareness of which facilities are sacrificed, and w
        
        Re: (Score:2)
        
        by weicco ( 645927 ) writes:
        
        If you don't understand what's wrong with reading an entire table with a million records, and discarding all but 5 of them client-side
        I think more common case is to read entire table with million records and discard all but 5 of them on the server-side, just because the one who designed the db didn't know a squat about indexing. I've seen databases without a single clustered key. This force a full table scan even if you return just a single record to client-side.
  - Re: (Score:2)
    
    by ADRA ( 37398 ) writes:
    
    Wow, that was a great video. Thanks for the link.
  - - Re: (Score:3, Funny)
      
      by Squeebee ( 719115 ) writes:
      
      Would you have preferred I have said bazillions?
  - - Re:Call me skeptical (Score:5, Interesting)
      
      by Natural Join ( 1711970 ) writes: on Thursday November 18, 2010 @06:46PM (#34275540)
      
      The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.
      There is no such thing as "data which does not fit into any schema", just like there is no such thing as data which cannot be encoded into binary. All data necessarily has a schema. However much or little of the schema you may choose to model in your (SQL or other type of) schema is, like the rest of software engineering, a design tradeoff.
      The various NoSQL approaches do not solve the full generality of data management problems the way SQL databases do. They are narrower in scope, and as is generally the case, they can achieve better performance by virtue of doing less. They can be much faster with certain data access paths, but at a cost of the fact that other data access paths become prohibitive.
      The frustrating thing for many of us is that the NoSQL spin on data management is about where mainstream data management was in the 1960s. As the field matured, it learned many important lessons, all of which are now being tossed out the window by people saying "oh we don't need that" but of course, they just haven't needed it yet. As these problems become apparent to them, they will spend the next decades of their lives reinventing what the data management field figured out in the 80s and 90s. Until then, they'll be making beginner mistakes, like thinking that their data somehow doesn't fit into any schema.
      
      Parent Share
      twitter facebook
      - Re: (Score:2)
        
        by mini me ( 132455 ) writes:
        
        You can, of course, create a key/value table in a SQL database, but then you are just creating your own NoSQL database on top of SQL. Why wouldn't you use a database designed to store data in that format, in that case?
    - Re: (Score:2)
      
      by 19thNervousBreakdown ( 768619 ) writes:
      
      The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.
      R'lyeh is apparently the new Silicon Valley.
    - Re: (Score:2)
      
      by Skal Tura ( 595728 ) writes:
      
      Roflmao! "Push to save data that does not fit into any schema."
      Excuse me, if you cannot comprehend the data (no schema can be created), then let someone else do it who actually knows what they are doing.
      All data is just data, they have relations, they have types, they have patterns etc. There's no magical data no one cannot comprehend enough to put it into a DB Schema.
      - Re: (Score:2)
        
        by mini me ( 132455 ) writes:
        
        Perhaps my wording was poor, but if you have knowledge of the structure of the data in advance, you are not thinking about the kinds of applications I am. Only the user knows what the structure of each record is going to be as they enter it.
        Yes, you can still do it with SQL, but it is very, very ugly. Why would you try to shoehorn the problem into SQL when databases exist that are designed for the job?
- Re: (Score:3, Interesting)
  
  by craftycoder ( 1851452 ) writes:
  
  My thoughts exactly. I have a couple 100 GB in a MsSQL database with extensive normalization and it is lightning fast. It's all about indexes and appropriate design.
- Re: (Score:3, Insightful)
  
  by Ruke ( 857276 ) writes:
  
  I think the real problem is that people are seeing inconsistencies in their growing systems, and looking to grow to a system that doesn't have inconsistencies. Which is basically impossible. It's not that the big players don't ever have inconsistent data - Amazon's Dynamo relies on reaching a quorum, rather than a totally consistent state. Rather, the big players have a much better idea of exactly how inconsistent their data can be, while still giving their system good performance.
  - Me too (Score:2)
    
    by Steeltoe ( 98226 ) writes:
    
    If people are getting inconsistencies, they're not following good CS practices in the first place. Any proper course on the subject will teach algorithms and database-design, among other subjects that alleviate problems before they arise. This is nothing new, but has been maturing for the last 30-40 years.
    I don't think people will automatically "get it", by just doing an exam on the subject. However, respect and a bit of dedication for such knowledge is required, to further refine one's skills.
    I believe it'
- Re:Call me skeptical (Score:5, Insightful)
  
  by RobertM1968 ( 951074 ) writes: on Thursday November 18, 2010 @05:24PM (#34273954) Homepage Journal
  
  Agreed... the biggest limitation I see with SQL (My, DB2, Postgres anyway... found plenty in MS) are people who don't know how to lay out a database, people who don't know how to install and configure the server daemon(s), people who have no idea how to properly select appropriate hardware, and people who don't know how the heck to do a query (as a for instance, I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards).
  
  Parent Share
  twitter facebook
  - Re:Call me skeptical (Score:5, Funny)
    
    by Cylix ( 55374 ) * writes: on Thursday November 18, 2010 @05:34PM (#34274118) Homepage Journal
    
    I just select * from * and then sort it out with grep and cut.
    
    Parent Share
    twitter facebook
    - Re: (Score:2)
      
      by Hoi Polloi ( 522990 ) writes:
      
      Good thing all my db's are massive, flat text files.
    - Re: (Score:2)
      
      by Monkeedude1212 ( 1560403 ) writes:
      
      This made my day, thank you.
  - Re: (Score:2)
    
    by nschubach ( 922175 ) writes:
    
    Is it people that don't know how to lay out a database or that you need to know how to lay out a database so it does fit with their need?
    I see a lot of hate around alternatives to SQL and most of them blame the design of data retention rather than accepting that there may be another way to achieve what is needed. It sounds to me like people trying to justify their job (which may not be necessary under a different model that doesn't need someone to "design" anything.)
    Honest question there...
    - Re: (Score:2)
      
      by Skal Tura ( 595728 ) writes:
      
      I use extensively the so called NoSQL stuff, but for caching. The actual, real data and it's relations are still stored in a RDBMS which is accessed if there's no cache hit.
      NoSQL for most part is just key/value pairs, nothing special.
      Try to map out in NoSQL reliably a very complex data structure (think 50+ interconnected relations of one to many or many to many)
  - - Re: (Score:3, Informative)
      
      by Skal Tura ( 595728 ) writes:
      
      So you've not worked on anything like that, where actually someone knew how to make a relational database.
      Ty very much, but our DBs are running fine with over 100million rows that's almost purely textual data being searched (relational full text searches) and 500+ q/s, and double that in hits per sec with a single modern server still having plenty of free resources.
      Ok that doesn't change that much, but then we got this one thing which over 100x the size, runs even way heavier searches (exponentially more co
- Re: (Score:2, Insightful)
  
  by Anonymous Coward writes:
  
  I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases...
  Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong.
  Agreed. My knee-jerk response once I saw the sentence in the article's summary was "No, you're not [hitting the limititations in
- Re: (Score:2)
  
  by GWBasic ( 900357 ) writes:
  
  Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...
  It depends on the kinds of queries and/or feature set you're trying to do. Don't assume that NoSQL is all about scalability, I chose MongoDB for my startup because we have a requirement that's very difficult to address in MySQL, but trivial in MongoDB.
- - Re: (Score:3, Funny)
    
    by PRMan ( 959735 ) writes:
    
    MySpace is also slower than maple syrup in January.
    - Re: (Score:3, Insightful)
      
      by nschubach ( 922175 ) writes:
      
      It's rather fast now that nobody uses it anymore.
      (sorry, I couldn't resist.)
  - Re: (Score:2)
    
    by thetoadwarrior ( 1268702 ) writes:
    
    Have you seen how "fast" MySpace is? It's certainly no Google.
- - - Re: (Score:2)
      
      by mini me ( 132455 ) writes:
      
      NoSQL is not about ACID. Several NoSQL databases are ACID compliant. NoSQL is about not using SQL to query a database. Some NoSQL databases are even relational, just like SQL.
Is it a technical or a budget problem? (Score:5, Insightful)

by ducomputergeek ( 595742 ) writes: on Thursday November 18, 2010 @05:12PM (#34273772)

Given my past 12 years between working at consultancies and start ups, I've seen this a few times. It's usually not a technical hurdle, it's a "We can't solve this problem within our budget" problem. Either by going out and hiring someone who is an expert at performance tuning with their DB of choice or moving from certain db's to real databases that could handle the work like MSSQL, DB2, Oracle, or in some cases Teradata if dealing with Data warehousing.
Because I've worked around some very large database installs in my day. Every time the scaling question/problem came up, it was solvable with RDBMS's, but the solution wasn't cheap.

Share
twitter facebook
- Re: (Score:2)
  
  by Cylix ( 55374 ) * writes:
  
  There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.
  However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.
  In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.
  - Re: (Score:3, Insightful)
    
    by doofusclam ( 528746 ) writes:
    
    There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.
    However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.
    In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.
    In a previous job at the start of my career, my company bought a Teradata system which came with the requisite sharp suited consultant, who told us how to lay out the DB schema.
    Being Teradata all the hashed indexes were in vogue, so it was lightning fast.
    Until the day they realised the users mainly did substring searches, which don't really work on a hashed index. Table scans a plenty = unhappy users.
    It doesn't mean a RDBMS is bad, it means that technology misapplied always sucks.
- Re:Is it a technical or a budget problem? (Score:4, Interesting)
  
  by PRMan ( 959735 ) writes: on Thursday November 18, 2010 @05:47PM (#34274372)
  
  My experience is that there is a lot you can do that is very cheap.
  One time, I walked into a mortgage company (I'm a developer, not a DBA) and they were complaining that they couldn't run a required government report breaking down their fee codes because it would time out after 2 minutes. The table had millions of records. I looked at the table and immediately noticed that they didn't have an index on fee code, which the report was trying to sort and total by. I told the manager that I would add an index on the fee code column after hours and run the report. He wasn't sure it would work so he said, "Go ahead and add it now."
  I added the index (which took about 30 seconds) and ran the report again. It finished in 45 seconds.
  I looked at the report. Whoever wrote it for them was concatenating strings all over the place. Millions of them. I switched the app to StringBuilder using a search-and-replace.
  I ran the report again. 8 seconds. In less than an hour I took a report that wasn't finishing in 2 minutes down to 8 seconds. That wasn't expensive for them and it wasn't hard to do.
  At another client, they were complaining about database slowness and the DBA wasn't having much luck fixing it. They fired him and asked me to look at it. I simply recorded a profiler log (a little slower for that day, but it's already dog slow so who would notice), found the longest duration and most common queries and then searched the source code repository and rewrote them. Many of these queries were cross-joins, missing indexes on the joined field or other really obvious problems. One was doing a data conversion on every record instead of data converting the passed in input once. It took me about 2-3 days to solve massive slowness problems. At the end, the employees were saying, "I'm glad they finally bought a new database server." This was at one of the country's largest mortgage companies with tens of millions of records in the database. And the fixes should have been brain-dead obvious to anyone with a few years of SQL experience.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Insightful)
    
    by Hoi Polloi ( 522990 ) writes:
    
    I wish most tuning efforts only required fixing glaring index issues. You eventually find yourself dealing with large dbs with all the basic tuning done and now they want to get app X to return in 8 secs instead of 10. Then you go down the rabbit hole of initialization params, hints, etc. Sadly design considerations are almost always off the plate at this point.
    - Re: (Score:2)
      
      by Skal Tura ( 595728 ) writes:
      
      Things get interesting when you have already done all that, run benchmark after benchmark for days upon days, even tried to change design and still lacking the last 2-5%, customer rejects the work and no new hardware is allowed.
      What to do then, is a real trick. Well, i managed to finally get more performance out of it, just so that someone else could break it a little bit later on without any sign anywhere in version control about how it was broken.
- Re: (Score:2)
  
  by lakeland ( 218447 ) writes:
  
  Interesting that you list only commercial DBs, do you have any trouble using postgres on very large databases?
  - Re: (Score:3, Informative)
    
    by ducomputergeek ( 595742 ) writes:
    
    I like PostgreSQL a lot. We use it now as the database that runs all of our company's software and those we deploy to clients. It's overkill for our point of sale product, but it's fast and stable. But PostgreSQL has lacked some features that made deploying it for very large databases not that attractive. There were three features that kept it out of the running: Lack of built in clustering, lack of Hot-Standby, no vender that could support both hardware and software under one roof (and could be sued if
- Re: (Score:2)
  
  by Skal Tura ( 595728 ) writes:
  
  "We can't solve this by adding new hardware" is a technical problem.
  Throwing hardware at a problem most of the time is not the good choice, and the pain now solving it technically pays off dividends for rest of the lifecycle of the software.
you're doing something wrong (Score:5, Insightful)

by Surt ( 22457 ) writes: on Thursday November 18, 2010 @05:13PM (#34273790) Homepage Journal

"I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "
Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
You really need to define your problem with much greater specificity to get a valuable answer.

Share
twitter facebook
- Re: (Score:3, Insightful)
  
  by Stradenko ( 160417 ) writes:
  
  Relational databases scale to pretty amazing heights
  Horizontally?
  - Re: (Score:2)
    
    by C_Kode ( 102755 ) writes:
    
    Some people use sharding to scale horizontally.
    - Re:you're doing something wrong (Score:4, Informative)
      
      by mlyle ( 148697 ) writes: on Thursday November 18, 2010 @05:48PM (#34274382)
      
      And that's what Translattice does, actually: for the database part of the system, we transparently shard large tables behind the scenes, and figure out how to store it to the computing resources available taking into account historical usage patterns and administrators' policies on how data must be stored (for redundancy and compliance purposes). A different population of nodes is used to store each shard and the redundancy is effectively loosely coupled, so when a failure or partition occurs, the work involved in re-establishing redundancy is fairly shared over all nodes. This provides linear scalability for many workloads and better redundancy properties, and can also as a side benefit position data closer to where it's consumed.
      When it comes time to access the data, the query planner in our database figures out how to efficiently dispatch the query to the minimal necessary population of nodes, introducing map and reduce steps to provide for data reduction and efficient execution.
      All of the table storage is directly attached to the nodes, eliminating much of the need for a storage area network and scaling beyond where shared-disk database clusters can go.
      
      Parent Share
      twitter facebook
  - Re: (Score:2)
    
    by Surt ( 22457 ) writes:
    
    Admittedly, a poor choice of words, but yes.
- Re:you're doing something wrong (Score:4, Insightful)
  
  by camperdave ( 969942 ) writes: on Thursday November 18, 2010 @05:30PM (#34274062) Journal
  
  Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
  You really need to define your problem with much greater specificity to get a valuable answer.
  Given that the title of the story is "Horizontal Scaling of SQL Databases?" the notion that that relational databases are able to scale to pretty amazing heights is irrelevant.
  You really need to define your problem with much greater specificity to get a valuable answer.
  That's definitely true. It may be, in fact, that an RDBMS is not what is needed at all.
  
  Parent Share
  twitter facebook
  - Re: (Score:3, Informative)
    
    by Surt ( 22457 ) writes:
    
    I meant heights of performance and size, but admittedly, that was a poorly chosen phrase. But yes, you scan scale sql very wide.
- Re: (Score:2)
  
  by Civil_Disobedient ( 261825 ) writes:
  
  You really need to define your problem with much greater specificity to get a valuable answer.
  The OP said they were using NoSQL. That alone explains everything.
  Solution (to the OP, not the parent who clearly understands what they're talking about): go learn how to use relational databases properly. Normalize your data. Nine times out of ten, if you're repeating information in multiple tables, you're doing something wrong. DO NOT USE BUSINESS KEYS. Surrogate keys only. Why? Because you do not own a cr
- Re: (Score:2)
  
  by Chriscypher ( 409959 ) writes:
  
  "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "
  Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.
  You really need to define your problem with much greater specificity to get a valuable answer.
  Duh!
  Maybe this guy's startup company specializes in exabyte database optimization... ... which explains why they are constantly struggling with relational database limitations !!!!
Wow (Score:5, Informative)

by mlyle ( 148697 ) writes: on Thursday November 18, 2010 @05:24PM (#34273960)

I didn't expect we'd be on Slashdot just yet. I'm Michael Lyle, CTO and cofounder of Translattice.
With regards to the original submitter's question, we'd love to talk to him. How much we can help, of course, depends on the specific scenario he's hitting.
What we've built is an application platform constituted from identical nodes, each containing a geographically decentralized relational database, a distributed (J2EE compatible) application container, and distributed load balancing and management capabilities. Massive relational data is transparently sharded behind the scenes and assigned redundantly to the computing resources in the cluster, and a distributed consensus protocol keeps all of the transactions in flight coherent and provides ACID guarantees. In essence, we allow existing enterprise applications to scale out horizontally while keeping the benefits of the existing programming model for transactional applications, by letting computing resources from throughout an organization combine to run enterprise workloads.
Current stacks are really complicated, multi-vendor, and require extensive integration/custom engineering for each application install. We're striving to create a world where massively performing infrastructure can be built from identical pieces.

Share
twitter facebook
- Re:Wow (Score:5, Insightful)
  
  by Cylix ( 55374 ) * writes: on Thursday November 18, 2010 @05:39PM (#34274194) Homepage Journal
  
  He posted to slashdot.... do you really think he can afford you?
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by joib ( 70841 ) writes:
  
  So you're claiming ACID; IOW you are saying your system provides consistency as per the definition used in CAP?
  How do you deal with network partitions? That is, per the CAP theorem, if you have C, is your system CA or CP?
  Thanks,
  - Re: (Score:2)
    
    by joib ( 70841 ) writes:
    
    Replying to myself, TFA contains some info about this. Hey, this is slashdot, who has time to read TFA?
    - Re:Wow (Score:5, Interesting)
      
      by mlyle ( 148697 ) writes: on Thursday November 18, 2010 @06:00PM (#34274656)
      
      The short answer is, CA/CP/AP on a transaction-by-transaction basis depending on application requirements. Also of note: network delay is effectively a special "partition", requiring an engine that can have massive workloads in flight and reconcile/order non-commutative changesets in a distributed fashion.
      
      Parent Share
      twitter facebook
- Re:Wow (Score:5, Funny)
  
  by Squeebee ( 719115 ) writes: <{squeebee} {at} {gmail.com}> on Thursday November 18, 2010 @05:59PM (#34274622)
  
  Congratulations, you just won Slashdot's buzzword bingo, please collect your prize at the cashier window in the back of the hall.
  
  Parent Share
  twitter facebook
- - Re: (Score:2)
    
    by aclarke ( 307017 ) writes:
    
    I think it's hiding behind the giant "I think everything is a conspiracy" badge I just awarded you.
Justification for new toys? (Score:5, Insightful)

by StuartHankins ( 1020819 ) writes: on Thursday November 18, 2010 @05:25PM (#34273978)

The post is so vaguely worded, I imagine the author is merely trying to find some justification to purchase some new toys. "See, Slashdot people think this is a good idea!"

I agree with most of the posts so far -- if you're truly hitting a limit, you are most likely doing something wrong. Hire an outside DBA to make recommendations if you don't have the resources in-house. I strongly suspect this is the real issue.

Share
twitter facebook
hbase is an option to NoSQL and Cassandra. (Score:4, Informative)

by ooglek ( 98453 ) writes: <beckman@angry[ ]com ['ox.' in gap]> on Thursday November 18, 2010 @05:31PM (#34274068) Homepage Journal

I recently read that someone moved their large operation from Cassandra to Hbase, a hadoop file system. http://hbase.apache.org/ [apache.org]
HBase is the Hadoop database. Use it when you need random, realtime read/write access to your Big Data. This project's goal is the hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware.
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop. HBase includes:
Convenient base classes for backing Hadoop MapReduce jobs with HBase tables
Query predicate push down via server side scan and get filters
Optimizations for real time queries
A high performance Thrift gateway
A REST-ful Web service gateway that supports XML, Protobuf, and binary data encoding options
Cascading, hive, and pig source and sink modules
Extensible jruby-based (JIRB) shell
Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX
HBase 0.20 has greatly improved on its predecessors:
No HBase single point of failure
Rolling restart for configuration changes and minor upgrades
Random access performance on par with open source relational databases such as MySQL

Share
twitter facebook
What you should really be doing... (Score:3, Funny)

by ADRA ( 37398 ) writes: on Thursday November 18, 2010 @05:40PM (#34274200)

Is to write better queries, I mean how hard can it be:
select * from (select * from A,B,C,D,E,F,G WHERE A.ID=B.AID(+) AND B.ID=C.BID(+) AND C.ID=D.CID(+) AND D.ID=E.DID(+) AND E.ID=F.EID(+) AND F.ID=G.FID(+) order by F.name ASC) where F.name='zzzzz'
Everything will work out, I swear.

Share
twitter facebook
- Re:What you should really be doing... (Score:4, Insightful)
  
  by Nerdfest ( 867930 ) writes: on Thursday November 18, 2010 @06:17PM (#34274994)
  
  I think I've seen SQL written by you before. I realize your post is a joke, but I see people aliasing bad table names down to even less readable single letters. It's a maintenance nightmare. Treat SQL like a language and write it so it's readable and maintainable. It even frequently helps when you're trying to resolve performance problems ... they're much easier to spot in well written SQL.
  
  Parent Share
  twitter facebook
  - Re:What you should really be doing... (Score:5, Funny)
    
    by ErikZ ( 55491 ) * writes: on Thursday November 18, 2010 @07:28PM (#34276144)
    
    I could do that, but your tears are delicious.
    
    Parent Share
    twitter facebook
Is this a slashvertisement or so? (Score:2)

by guruevi ( 827432 ) writes:

What limits are you hitting. And why are you mentioning but one of the many solutions to your problem one which is probably mighty expensive compared to the other solutions.
If you're genuinely hitting a limit, you're doing it wrong. You're probably not Google so most likely you're having issues scaling your proprietary and expensive SQL database (Oracle, MSSQL) but don't want to buy more $10-20k licenses. Most likely you can fix it by simply throwing better and more hardware at it (SSD, more hard drives and
- Re: (Score:2)
  
  by cheesedog ( 603990 ) writes:
  
  Google isn't the only company in the world that has to deal with petabytes of data. It's also not the only company that has to deal with incredibly large volumes of structured data.
  I speak from experience, son. Your relational DB can't handle successful internet-scale loads, no matter how many awesome dbas you hire, and no matter how much money you fork over to Oracle.
MySQL scales just fine. (Score:5, Interesting)

by poptix_work ( 79063 ) writes: on Thursday November 18, 2010 @05:50PM (#34274432) Homepage

I work with some very high traffic sites, storing large data sets (100GB+).
Depending on the application (if it allows for different write-only/read-only database configurations) we'll have a master-master replication setup, then a number of slaves hanging off each MySQL master. In front of all of this is haproxy* which performs TCP load balancing between all slaves, and all masters. Slaves that fall behind the master are automatically removed from the pool to ensure that clients receive current data.
This provides:
* Redundancy
* Scaling
* Automatic failover
The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.
(*) http://haproxy.1wt.eu/ [1wt.eu]

Share
twitter facebook
- Re: (Score:2, Informative)
  
  by cheesedog ( 603990 ) writes:
  
  100GB+ is not a large dataset.
  - Re: (Score:2)
    
    by dkf ( 304284 ) writes:
    
    100GB+ is not a large dataset.
    A dataset is large when the quickest way of getting it across the country involves Fedexing a box of harddrives.
- Re: (Score:2)
  
  by clockwise_music ( 594832 ) writes:
  
  The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.
  Excellent quote.
  
  Timothy, you're asking the wrong question. "Is anyone using this system in production?" bzzzz, wrong. The correct question is "What systems are people _using_ in production?"
- - Re: (Score:2)
    
    by poptix_work ( 79063 ) writes:
    
    By 'high traffic' I mean sites pushing in excess of 100gbit/s. The sites could function fine with 2x Dell PER710 (Quad-core Xeon E5520 2.266GHz 16.00GB RAM 6x SAS 147gb PERC6/i), but we require redundancy and failover capacity.
fast and extremely scalable (Score:2)

by bhcompy ( 1877290 ) writes:

The fastest DB I've ever used is based on PICK OS/DB. Reality is the retail name for it now(essentially an emulator with an API for *nix/Windows). The military used it for inventory tracking and various companies still use it today for a great deal of things. ADP uses it for extremely large databases with tons of history for accounting, financials, inventory, etc. Even very old systems with 20+ years of data are very responsive/quick(these systems are running Digital Unix 4 with Alpha processors) Pick/
You're hitting the 4th normal form limit (Score:2)

by crovira ( 10242 ) writes:

You're going to stay stuck there (and getting progressively worse) until the designers of your database start to implement 5th normal form.
That means taking into account the relationships between data elements and implementing them as something other than aggregated tuples.
The aggregation problem is getting worse as you try to implement new relationships.
Answer... (Score:2)

by TheSync ( 5291 ) writes:

MSSQL...Oracle....
The question is so uselessly phrased (Score:2)

by obarthelemy ( 160321 ) writes:

that I have trouble imagining you have any kind of skill for tackling the issue.
You do realize you give us ZERO info on what the problem is, but do push a very specific (if not fringe) approach to your non-question ?
If your problem-solving skills are on a par with your problem-describing skills, you're in for hard times.
- Re:What company? (Score:4, Insightful)
  
  by jlusk4 ( 2831 ) writes: on Thursday November 18, 2010 @05:51PM (#34274454)
  
  Geez, you guys. There's a real person behind the question. Do you HAVE to be an asshole?
  
  Parent Share
  twitter facebook
- Re: (Score:2)
  
  by lakeland ( 218447 ) writes:
  
  So does ever other database nowadays, even MySQL.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

XML (Score:2, Funny)

Re:XML (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Have you tried Perl? (Score:2)

Rick Cattell's work on scalable datastores (Score:5, Informative)

Re: (Score:2)

What limitations are you running into? (Score:5, Insightful)

Re:What limitations are you running into? (Score:5, Interesting)

Re:What limitations are you running into? (Score:5, Insightful)

Re: (Score:2)

Relational stuff scales (Score:5, Insightful)

Re: (Score:2, Interesting)

Re:Relational stuff scales (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Relational stuff scales - not around the world! (Score:3, Interesting)

Consider scaling via other layers? (Score:4, Interesting)

Re: (Score:2)

Call me skeptical (Score:5, Insightful)

Re:Call me skeptical (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re:Call me skeptical (Score:5, Informative)

Re: (Score:2)

Argh-stories (Score:2)

Re: (Score:2)

Re:Call me skeptical (Score:4, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Funny)

Re:Call me skeptical (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Interesting)

Re: (Score:3, Insightful)

Me too (Score:2)

Re:Call me skeptical (Score:5, Insightful)

Re:Call me skeptical (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2, Insightful)

Re: (Score:2)

Re: (Score:3, Funny)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Is it a technical or a budget problem? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:3, Insightful)

Re:Is it a technical or a budget problem? (Score:4, Interesting)

Re: (Score:3, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Informative)

Re: (Score:2)

you're doing something wrong (Score:5, Insightful)

Re: (Score:3, Insightful)

Re: (Score:2)

Re:you're doing something wrong (Score:4, Informative)

Re: (Score:2)

Re:you're doing something wrong (Score:4, Insightful)

Re: (Score:3, Informative)

Re: (Score:2)

Re: (Score:2)

Wow (Score:5, Informative)

Re:Wow (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re:Wow (Score:5, Interesting)