Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Databases

Horizontal Scaling of SQL Databases? 222

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"
This discussion has been archived. No new comments can be posted.

Horizontal Scaling of SQL Databases?

Comments Filter:
  • by Anonymous Coward on Thursday November 18, 2010 @04:55PM (#34273524)

    It would be a lot easier to talk about solutions if you said which limitations you run into.

    Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

  • by Anonymous Coward on Thursday November 18, 2010 @05:00PM (#34273586)

    Learn partitioning principles, get a database product that does partitioning properly, learn normalization, never worry again about not being able to scale with relational databases. It just requires some real skills but relational databases really do scale all the way up.

  • Call me skeptical (Score:5, Insightful)

    by Kjella ( 173770 ) on Thursday November 18, 2010 @05:05PM (#34273674) Homepage

    Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong. As long as the volume is small you can make almost anything happen on SQL. Hell, most small business I've known run mostly on Excel. Somehow I don't see a startup needing NoSQL unless they specialized in processing huge amounts of data, in which case trying to make slashdot work on your core business seems stupid. But maybe I missed something...

  • by ducomputergeek ( 595742 ) on Thursday November 18, 2010 @05:12PM (#34273772)

    Given my past 12 years between working at consultancies and start ups, I've seen this a few times. It's usually not a technical hurdle, it's a "We can't solve this problem within our budget" problem. Either by going out and hiring someone who is an expert at performance tuning with their DB of choice or moving from certain db's to real databases that could handle the work like MSSQL, DB2, Oracle, or in some cases Teradata if dealing with Data warehousing.

    Because I've worked around some very large database installs in my day. Every time the scaling question/problem came up, it was solvable with RDBMS's, but the solution wasn't cheap.

  • by Surt ( 22457 ) on Thursday November 18, 2010 @05:13PM (#34273790) Homepage Journal

    "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. "

    Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.

    You really need to define your problem with much greater specificity to get a valuable answer.

  • by MatthiasF ( 1853064 ) on Thursday November 18, 2010 @05:16PM (#34273832)
    In Cloud scenarios, a distributed relational database is cumbersome or even impossible to maintain. Hence why lots of web companies have moved over to NoSQL solutions tailored to their processes.

    So, you're describing centralized, local databases whereas the OP is focusing on decentralized, cloud databases.
  • by Ruke ( 857276 ) on Thursday November 18, 2010 @05:23PM (#34273932)
    I think the real problem is that people are seeing inconsistencies in their growing systems, and looking to grow to a system that doesn't have inconsistencies. Which is basically impossible. It's not that the big players don't ever have inconsistent data - Amazon's Dynamo relies on reaching a quorum, rather than a totally consistent state. Rather, the big players have a much better idea of exactly how inconsistent their data can be, while still giving their system good performance.
  • by Stradenko ( 160417 ) on Thursday November 18, 2010 @05:23PM (#34273940) Homepage

    Relational databases scale to pretty amazing heights

    Horizontally?

  • by RobertM1968 ( 951074 ) on Thursday November 18, 2010 @05:24PM (#34273954) Homepage Journal

    Agreed... the biggest limitation I see with SQL (My, DB2, Postgres anyway... found plenty in MS) are people who don't know how to lay out a database, people who don't know how to install and configure the server daemon(s), people who have no idea how to properly select appropriate hardware, and people who don't know how the heck to do a query (as a for instance, I worked on some code done by someone else, where on massive records, they were always selecting "*" instead of the needed or anticipated values. Big waste when one needs (by ID#) last and first name and selects a whole row instead - then wonders why it's not scaling upwards).

  • by StuartHankins ( 1020819 ) on Thursday November 18, 2010 @05:25PM (#34273978)
    The post is so vaguely worded, I imagine the author is merely trying to find some justification to purchase some new toys. "See, Slashdot people think this is a good idea!"

    I agree with most of the posts so far -- if you're truly hitting a limit, you are most likely doing something wrong. Hire an outside DBA to make recommendations if you don't have the resources in-house. I strongly suspect this is the real issue.
  • by camperdave ( 969942 ) on Thursday November 18, 2010 @05:30PM (#34274062) Journal

    Relational databases scale to pretty amazing heights. The notion that you are hitting some limit of relational databases at a startup stretches the imagination. I mean, really, you've already hit exabyte data sizes? That's typically where relational starts to struggle.

    You really need to define your problem with much greater specificity to get a valuable answer.

    Given that the title of the story is "Horizontal Scaling of SQL Databases?" the notion that that relational databases are able to scale to pretty amazing heights is irrelevant.

    You really need to define your problem with much greater specificity to get a valuable answer.

    That's definitely true. It may be, in fact, that an RDBMS is not what is needed at all.

  • by Anonymous Coward on Thursday November 18, 2010 @05:38PM (#34274170)

    The real problem is scale. Any SQL-DB server will cope with most application fine, but add live data on a public facing site with a decent volume of users, and they're crawl to a slow death. This is why non-crucial sites use denomenilization and do the dastardly deed of data duplication to speed up their bad query and suspect table design.

    So what's the solution? Very large and expensive boxes for the simplest method (no one likes this these days), and then lots of boxes performing certain tasks. Which has its own huge costs, because skilled people in this field are very few and far between, and are already working for Yahoo, Google and Fartybook.

  • Re:Wow (Score:5, Insightful)

    by Cylix ( 55374 ) * on Thursday November 18, 2010 @05:39PM (#34274194) Homepage Journal

    He posted to slashdot.... do you really think he can afford you?

  • by nschubach ( 922175 ) on Thursday November 18, 2010 @05:42PM (#34274256) Journal

    It's rather fast now that nobody uses it anymore.

    (sorry, I couldn't resist.)

  • by Anonymous Coward on Thursday November 18, 2010 @05:43PM (#34274272)

    I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases...

    Call me skeptical but there are companies out there with massive amounts of data in relational databases, if you as a setup are "constantly hitting limitations" you're either a very odd startup or using it very wrong.

    Agreed. My knee-jerk response once I saw the sentence in the article's summary was "No, you're not [hitting the limititations in what we can do with relational databases]. You're hitting the limits of what you know about performance tuning and scalability with the relational databases you have.

    NoSQL, BigTable, and Cassandra are designed for extremely fast key-value pair lookups over enormous datasets (as one poster puts it, > exabyte-sized.) With these solutions alone, you lose:

    a) ACID
    b) FK relations/semantic modeling

    which is huge. (If you don't know why losing ACID and FK relations is such a bad thing, you might as well stop here, hit the library for a good database textbook, read and understand it, then come back in 3-6 months and rephrase your question.)

    If you *really* have > exabyte-sized data in a table or two and you really are hitting the limits of what current RDBMS engines can provide (and if you haven't looked at DB2 or Oracle, maybe you should - their optimizers are better than Postgres or (laugh) MySQL), you'd probably want to work around (a) and (b) by using some sort of enterprise transaction management system (e.g. JTA if you're using Java EE), then incorporate the tables you need into NoSQL, Cassandra, or BigTable by providing middleware to interface with these hash stores that provides support for two-phase distributed commit and fakes the FK relationship to cross datastore boundaries.

    And if you think that doesn't sound too bad, think again: what I just described is a HUGE undertaking. Are you really sure you haven't exhausted all other options to stick with proven database technology that performs well up to exceptionally large-sized datasets? Maybe it's time to hire, you know, a real DBA - this type of analysis is what they get paid the big bucks for.

  • Re:What company? (Score:4, Insightful)

    by jlusk4 ( 2831 ) on Thursday November 18, 2010 @05:51PM (#34274454)

    Geez, you guys. There's a real person behind the question. Do you HAVE to be an asshole?

  • FIFY (Score:1, Insightful)

    by Anonymous Coward on Thursday November 18, 2010 @05:53PM (#34274522)

    "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what [OUR OUTSOURCED INDIAN DEVELOPERS] can do with relational databases.

    FIXED IT FOR YOU MY PRETTY LITTLE OPERATIONS MANAGER. (Just using all caps to make you feel more at home)

  • by DarkOx ( 621550 ) on Thursday November 18, 2010 @06:11PM (#34274878) Journal

    I would have to agree, its really hard to imagine a "start up" can't make anything work on traditional SQL RDBMS(es). If you put the right hardware underneath it even SQL Server 2000 (64bit anyway) will scale just fine to terabyte size databases at thousands of transactions per second. That is not on impossible hardware for a successful start to buy either, we are talking a dedicated storage controller with gigabyte or so cache and few dozen SAS drives. I know I have worked on such projects.

    You need the schema right, and if its more reads than writes you might even de-normalize a little and you will need to partition the data appropriately, but it can be done. This is why realDBAs still make the big bucks. There is a lot to know in that domain. You probably should hire someone who is an expert on whatever stuff you are using now to consult before you go down the path of NOSQL. All you told us is you are a growing start up with is not much to go on but without know what you are doing its hard for me to believe you are doing anything on a scale that can't be done well with a relational database; but maybe I am wrong and maybe you are doing something huge. Remember as soon as you go down the NOSQL path you are going to have to be doing a great deal of heavy lifting because the quantity of libraries and off the shelf stuff out there is not great.

  • by Nerdfest ( 867930 ) on Thursday November 18, 2010 @06:17PM (#34274994)
    I think I've seen SQL written by you before. I realize your post is a joke, but I see people aliasing bad table names down to even less readable single letters. It's a maintenance nightmare. Treat SQL like a language and write it so it's readable and maintainable. It even frequently helps when you're trying to resolve performance problems ... they're much easier to spot in well written SQL.
  • by Hoi Polloi ( 522990 ) on Thursday November 18, 2010 @06:28PM (#34275190) Journal

    I wish most tuning efforts only required fixing glaring index issues. You eventually find yourself dealing with large dbs with all the basic tuning done and now they want to get app X to return in 8 secs instead of 10. Then you go down the rabbit hole of initialization params, hints, etc. Sadly design considerations are almost always off the plate at this point.

  • by doofusclam ( 528746 ) <slash@seanyseansean.com> on Thursday November 18, 2010 @06:50PM (#34275626) Homepage

    There are a few other players in the field next to teradata, but when you move to that format there is nothing that would be associated with the word cheap.

    However, generally when it gets to that level of field the amount of data in storage usually makes it very obvious.

    In some scenarios, we have avoided going to those rather massive solutions by really digging down and seeing if we really needed to store everything.

    In a previous job at the start of my career, my company bought a Teradata system which came with the requisite sharp suited consultant, who told us how to lay out the DB schema.

    Being Teradata all the hashed indexes were in vogue, so it was lightning fast.

    Until the day they realised the users mainly did substring searches, which don't really work on a hashed index. Table scans a plenty = unhappy users.

    It doesn't mean a RDBMS is bad, it means that technology misapplied always sucks.

  • by atomic777 ( 860023 ) on Thursday November 18, 2010 @07:58PM (#34276538)
    Right. "We are hitting limitations in what we can do with X" means they cannot solve an underlying difficult problem Z, and are hoping that by swapping X with magic fairy dust Y, that somehow Z will go away. Sales people owe their BMWs to this simple fallacy.
  • by Klinky ( 636952 ) on Thursday November 18, 2010 @09:57PM (#34277562)

    NoSQL is not just key-value lookups. Take a look at Redis or MongoDB, there are novel ideas in both of them & yes they do bring new things to the table. They are NOT memcache. I am also not sure people are "sacrificing a lot of what is present in an RDBMS" by choosing NoSQL over an RDBMS. I think your gripe is with people who don't know what the hell they're doing, but you project that griping on to NoSQL in general. There are some things that RDBMSs are really good at, there are some things RDBMSs aren't so great at. The huge majority of people in the NoSQL communities and the users of these solutions know that loading a million objects client-side and discarding all but 5 is stupid & no one would suggest that is a failing of either RDBMS or NoSQL solutions, but squarely on the user.

    I would have to say that NoSQL is more relatable to how people think about objects and their relation to each other. People don't easily boil their objects down into relational tables and how each of those tables should interact with each other. This takes skill & talent & can be a bit of a pain to dive into which is why we have a bunch of ORM solutions which add another layer of cruft on top of RDBMSs. NoSQL is basically getting rid of the ORM & the tables(though some still use table-like structures). For apps that use that would normally use ORM(a lot of web apps) extensively that's great. For newbies who don't have years and years under their belt designing, tweaking normalization, sharding/partitioning it can be easier to pickup. Some of the NoSQL solutions have clustering/horizontal scaling and/or replication built-in, with no or very little schema/query changes required.

    So for some NoSQL will be a better solution and for others a RDBMS will be a better solution. I wouldn't knock either. Just because you can do something in an RDBMS doesn't mean it's better than a NoSQL solution & visa versa.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...