Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×
Databases

Horizontal Scaling of SQL Databases? 222

still_sick writes "I'm currently responsible for operations at a software-as-a-service startup, and we're increasingly hitting limitations in what we can do with relational databases. We've been looking at various NoSQL stores and I've been following Adrian Cockcroft's blog at Netflix which compares the various options. I was intrigued by the most recent entry, about Translattice, which purports to provide many of the same scaling advantages for SQL databases. Is this even possible given the CAP theorem? Is anyone using a system like this in production?"
This discussion has been archived. No new comments can be posted.

Horizontal Scaling of SQL Databases?

Comments Filter:
  • by mlts ( 1038732 ) * on Thursday November 18, 2010 @05:01PM (#34273608)

    Another idea is to scale using other layers, if there are problems at the SQL server level.

    At the lower areas, one can go with a mainframe (parallel sysplex) and have geographically separate pieces of hardware acting coherently.

    At the higher layers, have the app use multiple SQL servers and handle the redundancy in this layer.

  • by Anonymous Coward on Thursday November 18, 2010 @05:04PM (#34273660)

    It would be a lot easier to talk about solutions if you said which limitations you run into.

    Is your dataset to large (large tables), are you having to much joins, too many transactions per second? In short, what is the problem we're trying to solve here?

    My money is on "No one here likes SQL" and "There aren't any exports on RDBMs to help us get things set up properly".

  • by ani23 ( 899493 ) on Thursday November 18, 2010 @05:07PM (#34273712)
    Partitioning does complicate backups and HA/DR scenarios as the entire system is dependent on all machines being up and running. Also in most commercial db's (I know about db2) this feature takes you to the enterprise tier of software which is usually very expensive.
  • Re:Call me skeptical (Score:3, Interesting)

    by craftycoder ( 1851452 ) on Thursday November 18, 2010 @05:19PM (#34273862)

    My thoughts exactly. I have a couple 100 GB in a MsSQL database with extensive normalization and it is lightning fast. It's all about indexes and appropriate design.

  • by PRMan ( 959735 ) on Thursday November 18, 2010 @05:47PM (#34274372)

    My experience is that there is a lot you can do that is very cheap.

    One time, I walked into a mortgage company (I'm a developer, not a DBA) and they were complaining that they couldn't run a required government report breaking down their fee codes because it would time out after 2 minutes. The table had millions of records. I looked at the table and immediately noticed that they didn't have an index on fee code, which the report was trying to sort and total by. I told the manager that I would add an index on the fee code column after hours and run the report. He wasn't sure it would work so he said, "Go ahead and add it now."

    I added the index (which took about 30 seconds) and ran the report again. It finished in 45 seconds.

    I looked at the report. Whoever wrote it for them was concatenating strings all over the place. Millions of them. I switched the app to StringBuilder using a search-and-replace.

    I ran the report again. 8 seconds. In less than an hour I took a report that wasn't finishing in 2 minutes down to 8 seconds. That wasn't expensive for them and it wasn't hard to do.

    At another client, they were complaining about database slowness and the DBA wasn't having much luck fixing it. They fired him and asked me to look at it. I simply recorded a profiler log (a little slower for that day, but it's already dog slow so who would notice), found the longest duration and most common queries and then searched the source code repository and rewrote them. Many of these queries were cross-joins, missing indexes on the joined field or other really obvious problems. One was doing a data conversion on every record instead of data converting the passed in input once. It took me about 2-3 days to solve massive slowness problems. At the end, the employees were saying, "I'm glad they finally bought a new database server." This was at one of the country's largest mortgage companies with tens of millions of records in the database. And the fixes should have been brain-dead obvious to anyone with a few years of SQL experience.

  • by poptix_work ( 79063 ) on Thursday November 18, 2010 @05:50PM (#34274432) Homepage

    I work with some very high traffic sites, storing large data sets (100GB+).

      Depending on the application (if it allows for different write-only/read-only database configurations) we'll have a master-master replication setup, then a number of slaves hanging off each MySQL master. In front of all of this is haproxy* which performs TCP load balancing between all slaves, and all masters. Slaves that fall behind the master are automatically removed from the pool to ensure that clients receive current data.

      This provides:
      * Redundancy
      * Scaling
      * Automatic failover

      The whole NoSQL movement is as bad as the XML movement. I'm sure it's a great idea in some cases, but otherwise it's a solution looking for a problem.

    (*) http://haproxy.1wt.eu/ [1wt.eu]

  • Re:Wow (Score:1, Interesting)

    by Crimey McBiggles ( 705157 ) on Thursday November 18, 2010 @05:53PM (#34274518)
    The problem with identical pieces, is that in order for them to be interoperable among myriads of applications, they must be very small, and there must be a great number of them. Not one business operates in a manner that is identical to another. If relational databases aren't solving the problem, it is more than likely due to poor data structure. The main difference that NoSQL provides in terms of what is exposed to a novice database administrator, is that NoSQL promotes key-value pairs. This is no different than what exists in a relational database, except that in RDBMS the admin is allowed and often compelled to create tables with multiple fields. More tables with fewer fields is the solution in either case.
  • Re:Wow (Score:5, Interesting)

    by mlyle ( 148697 ) on Thursday November 18, 2010 @06:00PM (#34274656)

    The short answer is, CA/CP/AP on a transaction-by-transaction basis depending on application requirements. Also of note: network delay is effectively a special "partition", requiring an engine that can have massive workloads in flight and reconcile/order non-commutative changesets in a distributed fashion.

  • Re:Call me skeptical (Score:5, Interesting)

    by Natural Join ( 1711970 ) on Thursday November 18, 2010 @06:46PM (#34275540)

    The small startups are using NoSQL because there is, more and more, a push in the web app market to store data which does not fit into any schema.

    There is no such thing as "data which does not fit into any schema", just like there is no such thing as data which cannot be encoded into binary. All data necessarily has a schema. However much or little of the schema you may choose to model in your (SQL or other type of) schema is, like the rest of software engineering, a design tradeoff.

    The various NoSQL approaches do not solve the full generality of data management problems the way SQL databases do. They are narrower in scope, and as is generally the case, they can achieve better performance by virtue of doing less. They can be much faster with certain data access paths, but at a cost of the fact that other data access paths become prohibitive.

    The frustrating thing for many of us is that the NoSQL spin on data management is about where mainstream data management was in the 1960s. As the field matured, it learned many important lessons, all of which are now being tossed out the window by people saying "oh we don't need that" but of course, they just haven't needed it yet. As these problems become apparent to them, they will spend the next decades of their lives reinventing what the data management field figured out in the 80s and 90s. Until then, they'll be making beginner mistakes, like thinking that their data somehow doesn't fit into any schema.

  • by mikehoskins ( 177074 ) on Thursday November 18, 2010 @11:54PM (#34278306)

    Can you shard the same SQL data store in Chicago, London, and Tokyo? Not with standard SQL databases, unless you write your own complicated replication techniques or pay through the nose. (See CAP Theorem).

    Yes, the company I work for has expressed the world-wide SQL database need, so this is not just a thought experiment.

    Have you heard of GemFire/GemStone, VoltDB, or Xeround?

    If you can get rid of the SQL requirement, try
        XML (or other format) on Amazon's S3
        or try one of the NoSQL databases, such as MongoDB, Riak, or CouchDB.

    All of the above scale horizontally, most even scale in a geographically diverse environment.

He has not acquired a fortune; the fortune has acquired him. -- Bion

Working...