Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Databases Programming Software

Dumping Lots of Data to Disk in Realtime? 127

AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "
This discussion has been archived. No new comments can be posted.

Dumping Lots of Data to Disk in Realtime?

Comments Filter:
  • by anon mouse-cow-aard ( 443646 ) on Saturday May 14, 2005 @09:58AM (#12528912) Journal
    Sure, optimize single node performance first, but keep in mind that horizontal scaling is something to look for. Put N machines behind a load balancer, ingest gets scattered among 'n' machines, queries go to all simultaneously. Redundant Array of Inexpensive Databases :-)

    Linux Virtual Server in front of several instances of your windows box will do, with some proxying stuff for queries. Probably cheaper than spending months trying to tweak single node to get to your scaling target, and will scale trivially much farther out.

  • by LuckyStarr ( 12445 ) on Saturday May 14, 2005 @09:59AM (#12528926)
    I agree. In fact SQLite performs quite well on a reasonable sized machine. 3000+ SQL updates on an indexed table should be no problem.
  • Re:Wonderware InSQL (Score:3, Interesting)

    by btlzu2 ( 99039 ) * on Saturday May 14, 2005 @10:15AM (#12529014) Homepage Journal
    How does archiving work? What is the performance of querying on a large table? (Hundreds of millions of rows) Can you hook into the database with any language/package you desire or proprietary tools only?

    Do you actually charge a license fee PER point?

    We had a need for a smaller SCADA system in our company and Wonderware could not answer these questions (except for the fee per point, which they actually charge PER POINT). This department is going with a different product.

    Sorry, but be very cautious of Wonderware.
  • I did some work on a DVD-Video authoring system that had some incredible file system requirments (obviously, when involving video data and the typical 4 GB data load for a single DVD disc).

    The standard file API architechture just didn't hold up, so we (the development team I was working with) had to rewrite some of the file management routines ourselves and work directly with the memory mapped architechture directly. This does give you some other advantages beyond speed as well, as once you establish the file link and set it in a memory address range you can treat the data in the file as if it were RAM within your program, having fun with pointers and everything else you can imagine. Copying data to the file is simply a matter of a memory move operation, or copying from one pointer to another.

    The thing to remember is that Windows (this is undocumented) won't allow you to open a memory-mapped file that is larger than 1 GB, and under FAT32 file systems (Windows 95/98/ME/and some low-end XP systems) the total of all memory mapped files on the entire operating system must be below 1 GB (this requirement really sucks the breath out of some applications).

    Remember that if you are putting pointers into the file directly, that it works better if the pointers are relative offsets rather than direct memory pointers, even though direct memory pointers are in theory possible during a single session run.
  • by gvc ( 167165 ) on Saturday May 14, 2005 @11:56AM (#12529574)
    "Can [the storage backend] handle 2000 random seeks per second?"

    The short answer is "no."

    A 10,000 RPM disk has a period of 6 mSec. That's 3 mSec latency on average for random access (not counting seek time or the fact that read-modify-write will take at least 3 times this long: read, wait one full rotation, write).

    So one disk can do, as a generous upper bound, 333 random accesses per second. I'll spare you the details of the Poisson distribution, but if you managed to spread these updates randomly over a disk farm, you'd need about 2000/333*e = 16 independent spindles.

    The trick to high throughput is harnessing, and creating, non-randomness. You can do a much better job of this with a purpose-built solution.

An authority is a person who can tell you more about something than you really care to know.

Working...