Forgot your password?
typodupeerror
Databases

Ask Slashdot: Which NoSQL Database For New Project? 272

Posted by Soulskill
from the mo-sql-mo-problems dept.
DorianGre writes: "I'm working on a new independent project. It involves iPhones and Android phones talking to PHP (Symfony) or Ruby/Rails. Each incoming call will be a data element POST, and I would like to simply write that into the database for later use. I'll need to be able to pull by date or by a number of key fields, as well as do trend reporting over time on the totals of a few fields. I would like to start with a NoSQL solution for scaling, and ideally it would be dead simple if possible. I've been looking at MongoDB, Couchbase, Cassandra/Hadoop and others. What do you recommend? What problems have you run into with the ones you've tried?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Which NoSQL Database For New Project?

Comments Filter:
  • Use PostgreSQL (Score:5, Informative)

    by Anonymous Coward on Wednesday April 09, 2014 @05:17AM (#46702803)

    If you need to store less than a few hundred million rows just use PostgreSQL.
    It supports JSON and transactions.

  • Short Intro (Score:5, Informative)

    by emblemparade (774653) on Wednesday April 09, 2014 @05:51AM (#46702933)

    It's a mistake to think that "NoSQL" is a silver bullet for scalability. You can scale just fine using MySQL (FlockDB) or Postresgl if you know what you're doing. On the other, if you don't know what you're doing, NoSQL may create problems where you didn't have them.

    An important advantage of NoSQL (which has its costs) is that it's schema-free. This can allow for more rapid iteration in your development cycle. It pays off to plan document structures carefully, but if you need to make changes at some point (or just want to experiment), you can handle it at the code level. You can also support older "schemas" if you plan accordingly: for example, adding a version tag or something similar that can tell your code how to handle it. So, even ignoring the dubious potential of better scalability, NoSQL can still be beneficial for your project.

    More so than SQL, NoSQL database are designed for different kinds of applications, and have different strengths:

    MongoDB is a really good backend engine that gives programmers lot of control over performance and its costs: if you need faster writes, you can allow for eventual integrity, or if you need faster reads, you can allow for data not being the absolute freshest. For many massive multiuser applications, not having immediately up-to-date data is a reasonable compromise. It also offers an excellent set of atomic operations, which from my experience compensate well for the lack of transactions. Furthermore, MongoDB is by far the most feature-rich of these, supporting aggregate queries and map-reduce, which again can make up for the lack of joins. It also offers good sharding tools, so if you do need to scale, you can. Again, I'll emphasize that you need a good understanding of how MongoDB works in order to properly scale. For example, map-reduce locks the database, so you don't want to rely on it too much. The bottom line is that MongoDB can offer similar features to SQL databases (though they work very differently), so it's good for first-timers.

    Couchbase is very good at dispersed synchronization. For example, if parts of your database live in your clients (mobile applications come to mind), it does a terrific job at resynching itself and handling divergences. This is also "scalable," but in a quite different meaning of the term than in MongoDB.

    I would also take a look at OrientDB: it's not quite as feature rich as MongoDB (and has no atomic operations), but it can work in schema-mode, and generally offers a great set of tools that can make it easy to migrate from SQL. It's query language, for example, looks a lot like SQL.

    The above are all "document-oriented" databases, where you data is not opaque: the database actually does understand how your data is structured, and can allow for deep indexing and updating of your documents. Cassandra and REDIS (and Tokyo Cabinet, and BerkeleyDB) are key-value stores: much simpler databases offering fewer querying features: your data is simply a blob as far the engine is concerned. I would be less inclined to recommend them unless your use case is very specific. Where appropriate, of course simpler is better. With these kinds of databases, there are actually very few ways in which you can create an obstacle for scalability: simply because they don't do very much, from a programming perspective.

    There are also in-between databases that are sometimes called "column-oriented": Google and Amazon's hosted big data services are both of this type. Your data is structured, but the structure is flat. Generally, I would prefer full-blown "document-oriented" databases, such as MongoDB and OrientDB. However, if you're using a hosted service, you might not have a choice.

    It's also entirely possible to mix different kinds of databases. For example, use MongoDB for your complex data and use REDIS for a simple data store. I've even seen sophisticated deployments that very smartly archive data from one DB to another, and migrate it back again when necessary.

  • by Raumkraut (518382) on Wednesday April 09, 2014 @06:27AM (#46703029)

    MongoDB has indexes.
    MongoDB also lets you store and query arbitrary data, in addition to any "key fields", without having to pre-define all the possible fields. Which it seems is what the submitter asked for.

    Where has this idea that "NoSQL" means "not a database" come from?

  • by DarkOx (621550) on Wednesday April 09, 2014 @06:31AM (#46703047) Journal

    I disagree, he is concerned about scaling. The last thing in the world he should do is use a bunch of flat files, unless he really just needs to store the data, but he already said he needs to do reports and totals on it.

    Also he is working in Ruby. The smart thing for him to do IMHO is write his program against ruby/DBI. It isn't the pretty database api, but it supports plenty of different backend options and it does not sound like his program needs especially complex database operations or queries. He can start working with something like SQLite as the database "server", and move up to something else, perhaps Postgress (which can be every bit as fast as the NOSQL solutions unless you are getting highly highly custom) without needing to alter his program.

  • by Anonymous Coward on Wednesday April 09, 2014 @08:41AM (#46703613)

    NoSQL is a good solution for horizontal scaling, CSV and SQL DB are not.

    I'd like to dispute this. Based on the OP's description of his application, two things come to mind:


    •    
    • His application is mostly-write-only. He probably does not need instant query ability, but may need to be able to handle a very large number of inserts per second (assuming he's justified in his assertion that he needs scalability). For this kind of application, logging your incoming data to a plain text file (or sequentially-appended binary data file, or any other write-only plain file approach) can be a significant performance improvement. This files can then periodically (e.g. every hour, every minute, whatever time frame suits) be pulled of local storage, merged, and inserted into a central database as a batch from which read queries are performed. Single batched updates are much more efficient than large numbers of small updates.
    •    

    • His queries are easily parallelized. He needs to perform only two operations: selecting data based on simple criteria, simple numerical summarization. Both of these are trivially scaled horizontally by using systems with local SQL databases and a simple service running on the machines as nodes in a map/reduce architecture.

    Blanket statements like yours above can't really be made without reference to the intended application, as some applications scale much more easily than others, and OP's sounds like it's one of the easy kind.

  • by Anonymous Coward on Wednesday April 09, 2014 @08:59AM (#46703711)

    >For storing and querying arbitrarily-structured data, which is what the submitter seems to be wanting
    I dunno. I read TFS and it looks more like he wants rows of tabular data. Were this a STX site, I'd vote to close as too broad since he hasn't actually said anything useful about what he's storing.

    So default answer to "Which NoSQL database should I use?" is always "Don't use NoSQL."

  • by DorianGre (61847) on Wednesday April 09, 2014 @09:12AM (#46703767)

    We are looking at 99% incoming data, 10-12 fields, 1000-2000 per session per week, X as many users as we can get.

  • by Anonymous Coward on Wednesday April 09, 2014 @10:26AM (#46704499)

    "Irregardless" is not a word, you nigger."

    Merriam-Webster:
    irregardless

    irregardless
    adverb \ir-i-gärd-ls\
    Definition of IRREGARDLESS

    Usage Discussion of IRREGARDLESS
    Irregardless originated in dialectal American speech in the early 20th century. Its fairly widespread use in speech called it to the attention of usage commentators as early as 1927. The most frequently repeated remark about it is that “there is no such word.” There is such a word, however. It is still used primarily in speech, although it can be found from time to time in edited prose. Its reputation has not risen over the years, and it is still a long way from general acceptance. Use regardless instead.

Thus spake the master programmer: "After three days without programming, life becomes meaningless." -- Geoffrey James, "The Tao of Programming"

Working...