Dumping Lots of Data to Disk in Realtime?

Dumping Lots of Data to Disk in Realtime? 127

Posted by Cliff on Saturday May 14, 2005 @09:04AM from the too-much-for-an-RDBMS? dept.

AmiChris asks: "At work I need something that can dump sequential entries for several hundred thousand instruments in realtime. It also needs to be able to retrieve data for a single instrument relatively quickly. A standard relational database won't cut it. It has to keep up with 2000+ updates per second, mostly on a subset of a few hundred instruments active at a given time. I've got some ideas of how I would build such a beast, based on flat files and a system of caching entries in memory. I would like to know if: someone has already built something like this; and if not, would someone want to use it if I build it? I'm not sure what other applications there might be. I could see recording massive amounts of network traffic or scientific data with such a library. I'm guessing someone out there has done something like this before. I'm currently working with C++ on Windows. "

Dumping Lots of Data to Disk in Realtime?

This discussion has been archived. No new comments can be posted.

Search 127 Comments Log In/Create an Account

Comments Filter:

horizontal scaling is good... (Score:3, Interesting)

by anon mouse-cow-aard ( 443646 ) writes: on Saturday May 14, 2005 @09:58AM (#12528912) Journal

Sure, optimize single node performance first, but keep in mind that horizontal scaling is something to look for. Put N machines behind a load balancer, ingest gets scattered among 'n' machines, queries go to all simultaneously. Redundant Array of Inexpensive Databases :-)

Linux Virtual Server in front of several instances of your windows box will do, with some proxying stuff for queries. Probably cheaper than spending months trying to tweak single node to get to your scaling target, and will scale trivially much farther out.

Re:Have you tried a relational database? (Score:3, Interesting)

by LuckyStarr ( 12445 ) writes: on Saturday May 14, 2005 @09:59AM (#12528926)

I agree. In fact SQLite performs quite well on a reasonable sized machine. 3000+ SQL updates on an indexed table should be no problem.

Re:Wonderware InSQL (Score:3, Interesting)

by btlzu2 ( 99039 ) * writes: on Saturday May 14, 2005 @10:15AM (#12529014) Homepage Journal

How does archiving work? What is the performance of querying on a large table? (Hundreds of millions of rows) Can you hook into the database with any language/package you desire or proprietary tools only?

Do you actually charge a license fee PER point?

We had a need for a smaller SCADA system in our company and Wonderware could not answer these questions (except for the fee per point, which they actually charge PER POINT). This department is going with a different product.

Sorry, but be very cautious of Wonderware.

Have you considered memory-mapped files? (Score:4, Interesting)

by Teancum ( 67324 ) writes: <robert_horning AT netzero DOT net> on Saturday May 14, 2005 @11:50AM (#12529524) Homepage Journal

I did some work on a DVD-Video authoring system that had some incredible file system requirments (obviously, when involving video data and the typical 4 GB data load for a single DVD disc).

The standard file API architechture just didn't hold up, so we (the development team I was working with) had to rewrite some of the file management routines ourselves and work directly with the memory mapped architechture directly. This does give you some other advantages beyond speed as well, as once you establish the file link and set it in a memory address range you can treat the data in the file as if it were RAM within your program, having fun with pointers and everything else you can imagine. Copying data to the file is simply a matter of a memory move operation, or copying from one pointer to another.

The thing to remember is that Windows (this is undocumented) won't allow you to open a memory-mapped file that is larger than 1 GB, and under FAT32 file systems (Windows 95/98/ME/and some low-end XP systems) the total of all memory mapped files on the entire operating system must be below 1 GB (this requirement really sucks the breath out of some applications).

Remember that if you are putting pointers into the file directly, that it works better if the pointers are relative offsets rather than direct memory pointers, even though direct memory pointers are in theory possible during a single session run.

Re:A commercial RDMS can cut it (Score:5, Interesting)

by gvc ( 167165 ) writes: on Saturday May 14, 2005 @11:56AM (#12529574)

"Can [the storage backend] handle 2000 random seeks per second?"

The short answer is "no."

A 10,000 RPM disk has a period of 6 mSec. That's 3 mSec latency on average for random access (not counting seek time or the fact that read-modify-write will take at least 3 times this long: read, wait one full rotation, write).

So one disk can do, as a generous upper bound, 333 random accesses per second. I'll spare you the details of the Poisson distribution, but if you managed to spread these updates randomly over a disk farm, you'd need about 2000/333*e = 16 independent spindles.

The trick to high throughput is harnessing, and creating, non-randomness. You can do a much better job of this with a purpose-built solution.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Dumping Lots of Data to Disk in Realtime? 127

Dumping Lots of Data to Disk in Realtime? More Login

Dumping Lots of Data to Disk in Realtime?

horizontal scaling is good... (Score:3, Interesting)

Re:Have you tried a relational database? (Score:3, Interesting)

Re:Wonderware InSQL (Score:3, Interesting)

Have you considered memory-mapped files? (Score:4, Interesting)

Re:A commercial RDMS can cut it (Score:5, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot