Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage Databases Programming Software

Dumping Lots of Data to Disk in Realtime? 127

AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "
This discussion has been archived. No new comments can be posted.

Dumping Lots of Data to Disk in Realtime?

Comments Filter:
  • Cluster it (Score:4, Insightful)

    by canuck57 ( 662392 ) on Saturday May 14, 2005 @09:24AM (#12528750)

    I know your working with windows but when I read this I said yes.

    I'm guessing someone out there has done something like this before.

    Google has a cluster of machines far larger than you need but their approach was a Linux cluster. Plus, for the amount of writes going on your going to want not to have any burdens on the system that are not needed.

  • by isj ( 453011 ) on Saturday May 14, 2005 @11:36AM (#12529429) Homepage
    My current company did something like this back in 2001 with real-time rating performance [digiquant.com], which conceptually is much like what you want to do: receive a lot of items and store them in a database, real-time. But you did not mention some of the more important details about problem:
    • How much processing has to be done per item?
    • How long can you delay comitting them to a database?
    • Do the clients wait for an answer? Can you cheat and respond immediately?
    • How many simultaneous clients must you support? 1? 5? 100?
    • What is the hardware budget?

    2.000 items/sec means that you must do bulk updates. You cannot flush to disk 2.000 times per second. So you program will have to store the items temporarily in a buffer, which gets flushed by a secondary thread when a timer expires or when the buffer gets full. use a two-buffer approach so you can stil receive while committing to the database.

    Depending on you application it may be beneficial to keep a cache of the most recent items for all instruments.

    You also have to consider the disk setup. If you have to store all the items then any multi-disk setup will do. If you actually only store a few items per instrument and update them, then raid-5 will kill you because it performs poorly with tiny scattered updates.

    Do you have to backup the items? How will you you handle backups while your program is running? This affects your choice of flat-file or database implementation.

  • just dump to disk (Score:1, Insightful)

    by Anonymous Coward on Saturday May 14, 2005 @12:41PM (#12529836)
    as others have said, just stream the data to disk with some kind of big RAM buffer in between. each instrument can go to a separate directory, each minute or hour of data goes to a separate file. A separate thread indexes or processes the data as needed.

    And don't forget the magic words: striping. you should interleave your data across many disks, and the index files should be on separate disks as well.

    Do striping+mirroring for data protection. do the striping at the app level for maximum throughput, do the mirroring at the hardware level.

    When you aren't going through layers of crap like an SQL database, you should *fly* like this on modern hardware.
  • by Glonoinha ( 587375 ) on Saturday May 14, 2005 @04:47PM (#12531295) Journal
    You will find that my imagination and abilities are only limited by my budget. Well that and, as I am finding, the Sarbanes / Oxley mandates that recently came down from the Productivity Prevention Team, quite effective in keeping me from actually getting any work done.

    I don't really care what it pays if it has anything to do with real-time systems (guidance or delivery systems a plus), if the R&D budget has enough wiggle room for better hardware (toys) than I have at home, if you promise that I will be able to participate in the production roll-out and be allowed to make the production environment succeed, and esp if there are a few challenges that are categorized "can't be done."

    Apollo 13 didn't get home because a bunch of mediocre guys sat around filling out paperwork requesting permission and setting up a committee to discuss business impact - Apollo 13 got home because a bunch of crack-junkie hardcore engineers decided that failure wasn't an option.

    So the stuff you do at work - is it hard? :)

Old programmers never die, they just hit account block limit.

Working...