Dumping Lots of Data to Disk in Realtime? 127
AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "
Cluster it (Score:4, Insightful)
I know your working with windows but when I read this I said yes.
I'm guessing someone out there has done something like this before.
Google has a cluster of machines far larger than you need but their approach was a Linux cluster. Plus, for the amount of writes going on your going to want not to have any burdens on the system that are not needed.
Did something like this some years ago (Score:3, Insightful)
2.000 items/sec means that you must do bulk updates. You cannot flush to disk 2.000 times per second. So you program will have to store the items temporarily in a buffer, which gets flushed by a secondary thread when a timer expires or when the buffer gets full. use a two-buffer approach so you can stil receive while committing to the database.
Depending on you application it may be beneficial to keep a cache of the most recent items for all instruments.
You also have to consider the disk setup. If you have to store all the items then any multi-disk setup will do. If you actually only store a few items per instrument and update them, then raid-5 will kill you because it performs poorly with tiny scattered updates.
Do you have to backup the items? How will you you handle backups while your program is running? This affects your choice of flat-file or database implementation.
just dump to disk (Score:1, Insightful)
And don't forget the magic words: striping. you should interleave your data across many disks, and the index files should be on separate disks as well.
Do striping+mirroring for data protection. do the striping at the app level for maximum throughput, do the mirroring at the hardware level.
When you aren't going through layers of crap like an SQL database, you should *fly* like this on modern hardware.
Re:Ramdisk database (Score:3, Insightful)
I don't really care what it pays if it has anything to do with real-time systems (guidance or delivery systems a plus), if the R&D budget has enough wiggle room for better hardware (toys) than I have at home, if you promise that I will be able to participate in the production roll-out and be allowed to make the production environment succeed, and esp if there are a few challenges that are categorized "can't be done."
Apollo 13 didn't get home because a bunch of mediocre guys sat around filling out paperwork requesting permission and setting up a committee to discuss business impact - Apollo 13 got home because a bunch of crack-junkie hardcore engineers decided that failure wasn't an option.
So the stuff you do at work - is it hard?