Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
×
Space

What Capacities Do Databases Have? 30

razor69 asks: "Having programmed PHP in conjunction with MySQL for some time now, I'm quite happy with the whole setup. However, I'm now working up to some more serious clients, which require larger and larger databases. Looking for database comparisons, I'm stuck with the stats MySQL give me and none of the database manufacturers tell me what the maximum amount of records their system can handle is. Now I know that Oracle is unlimited, but so is their price. Which database program is suitable for what price, how many records can they hold (practically as well as theoretically) and how do they all REALLY compare?"
This discussion has been archived. No new comments can be posted.

What Capacities Do Databases Have?

Comments Filter:
  • *cough* DB/2? *cough*

    (jfb)
  • Does that mean VARCHAR are also stored outside with just a pointer in the row ? If that's also the case, then 64 KB sounds enough to me then. Char and int fields aren't usually enough to fill 64 KB...
  • How about blob fields ? If you use blobs then you obviously have some kind of large binary object to store, and then 64 KB seems kinda short for it. I use MySQL on my site and store lots of images on the database itself, as this is convenient for backups/restore (one backup saves everything).

    And for those who says "don't store images on the DB", I store them here, but I serve them thru a cache, so there's no load on the database.
  • Well the BLOBs let me put all gif and jpeg along with other records in the same database. This has the following advantages :
    - it is possible to backup/restore my whole web site with just one single call to mysqldump
    - my pics indexing is always consistent, whereas as separate files some images could be erased/renamed/moved and the db not updated (or the opposite, having the db updated but not the files)
    - it's cleaner, since things that goes together are stored together and not in 2 very different places

    As for BLOBs usefullness, there's also the TEXT type (a BLOB type that is specifically made to handle large texts), on which you can apply WHERE conditions (LIKE "%something%"). Although not fast, it's still usefull :)
  • Universal Power Units :) With Oracle, you choose between UPUs en Named Users. Named Users is a good idea when you have few users, like a central DB with a few terminals or a webapp with slow growth and less than about 50 users. Beyond that, UPUs. Mainly because Oracle measures users at the frontend, your webserver for example :(

    The general idea is paying based on CPU speed and architecture. The calculation is like this:

    Mhz x Multiplier x Price per UPU

    With a minimum of 200 UPUs. So, for a dual proc Intel box with two 600Mhz P3s:

    2 x 600 x 1.0 (Intel arch multiplier) = 1200

    1200 x Price per UPU

    Check the Oracle website for pricing details, but Oracle 8i basically starts at a few thousand dollars for the Standard Edition, the Enterprise Edition is a *lot* more expensive.

    Another thing is the license period. Oracle offers 1-Year, 2-Year and 4-Year licenses as well as a 'Perpetual' license. The latter is a license without any time limits.

    The 1-Year license is exclusively for webhosting but is also the cheapest to start off with since it's priced at 20% of the Perpetual license. You'll have to spend another 20% in a year though...

    Oracle pricing has gotten cheaper lately (if I am not mistaken) but it still sucks :(

    HTH
    bBob

    --

  • >How about blob fields ? If you use blobs then you obviously have some kind of large binary object to store, and then 64 KB seems kinda short for it.

    Indeed it would - Interbase/Firebird only stores 32-bit Blob references in the actual record, the data itself is stored externally as a number of chained segments, the total of which can be just under 32 gigabytes in length.

    VARCHAR's have a limit of 32,767 bytes each.
  • > Good thing it's open... Someone email me the source for it and I'll increase the ridiculous 64K row size limit. ;-)

    Have at it!

    Firebird:

    http://sourceforge.net/projects/firebird/

    Interbase:

    http://www.borland.com/interbase/

  • SQL Server 2000 does partitioned tables.
  • It might be referring to the power unit pricing scale. There are two licensing scales - one based on the capacity of your hardware, the second based on the number of 'named ' application users.

    There may be a 25k license for a certain (large) number of named users - visit the Oracle store to compare prices - it's at their website.

  • Firstly, niether Oracle or MS post meaningful stats. Both of them require you to make a non-disclosure agreement with them - you are not allowed to reveal benchmarks.

    Secondly, there are the TPC [tpc.org] benchmarks, but take them with a huge pinch of salt. They are not about what the average IT shop can produce - they are about what hordes of MS/Oracle/IBM/Sun/Compaq engineers can produce with enormous budgets. They are also about finding the latest loophole in the TPC rules.

    Thirdly, Oracle, at least allow you to download their RDBMS to try it for yourself - and that is exactly what you should do.

    I wouldn't worry too much about limits on the number of rows in tables etc. unless you are storing absolutely obscene amounts of data - more than supermarkets in the UK who keep records of every purchase made with loyalty cards etc.

  • OK, your database is growing fast. At some point you will find that performance is going to start to blow and you need to do maintenance (i.e. rebuild indexes). This will be a problem. You need to look at your application design and think about archiving data. ALternatively buy Oracle and read up on partitioned tables.

    I wouldn't have thought it necessary to change your RDBMS because of poor JDBC drivers - but then the problem is that MS do not supply the SQL Server ones themselves, cos they're still sulking about Java.

    Oracle is complex, but so is any serious RDBMS these days. SQL Server is too - they just have a bunch of GUI tools that are supposed to simplify things. Although it pains me to tell you this I expect SQL Server could cope with your needs at the moment - you just need to start tuning and looking at your storage in detail.

  • We use informix here for some pretty big OLTP systems running Linux.

    Check it out:
    http://www.informix.com/evaluate/
  • Maximum row size: 64K
    And don't forget, Interbase/Firebird is absolutely free and open source!

    Good thing it's open... Someone email me the source for it and I'll increase the ridiculous 64K row size limit. ;-)
  • Really? So I can have a row with 2 columns: one unique key and one nice long text field? Bummer.
  • Oracle pricing is comical. You have to calculate your "Oracle Power Units". I'm serious. OPUs. I went thru their website trying to price Oracle for a project and I was stuck. I had to call their sales number. Basically you add up the number of cpu's multiple by their Mhz and multiple this by your number of users. Then add on a few extra thousands dollars to that. No really you have to call. Its just too compex. To make it simple just use MS SQL Server. Its one price regardless of CPUs or their speed. MS used to have a deal where if your client base was large enough you could just ask them to round off the license at $25k. They probably won't do this anymore. For one thing MS wants to be taken seriously and to some lame people that means asking alot of money. So MS has jacked up the price. Its silly but so are most database people.
  • I'm glad this question got posted. In addition to talk about what a database can hold, can anyone who has info post more about real-world performance with large databases?

    Currently, I'm using MS SQL Server 7 for a project in which the database is growing by about 800,000 records/day. It's a pretty flat DB but it's not going too fast. Also, I am using Java and JDBC to do a lot of stuff with the database. I've noticed a very large performance difference between different JDBC drivers. I switched from Weblogic to NetDirect and inserting some 9,000 rows in a table went from taking somewhere on the order of four minutes to a little over a minute and a half. Also, I've noticed that using batches for inserts where supported does not really speed things up at all. Does anyone have any info on the fastest JDBC driver/rdbms combination for fairly large and rapidly growing databases?

    Is it worth the frustration to switch to oracle, performance-wise? I messed around with it briefly on solaris and it seemed like an incredibly difficult system to use.

    jeb.

  • And for those who says "don't store images on the DB", I store them here, but I serve them thru a cache, so there's no load on the database

    4 real though, what reason could you have for using BLOBs? I havent seen anyone use them in years. If the image files (or whatever type) are taking up space in the database, why not just leave them in an ordered filesystem and reference them from the DB. You are already taking up space on magnetic by the object being in your column and in cache, why even put them in the DB to begin with? It's more efficient, more easily scalable this way, you may be able to use some compression on the object files, database backups and maintenance will be faster, etc.

    Have you looked into some actual "imaging" systems? Filenet, OTG, or some of the slew of open source'ers that have been developed l8-lee?

  • sounds like it works well for you, thanks for the insight.
  • Blobs are generally stored outside of the row with a reference to the blob in the record. I suspect that InterBase does the same thing. My guess is that 64KB is the 'page size' of the database and it cannot split a row across two pages (common limitation) and thus is limited to 64KB 'rows'. THe catch is what is stored *in* the row and what is stored as a reference to external data.

  • Gee DB2 doesn't count as one of the "Big" Databases? C'mon it even runs on Linux.
  • Perhaps somebody on this thread can answer my question, since it seems relevant. Most of the time people speak of Oracle they use expressions like "unlimited price" in the same sentence. But the few people I've talked to that are users (and therefore evangelists) of Oracle have all said $25k. That's not really *that* bad, especially for a company that knows its database needs when they make the budget. My question is, is that really the price? What exactly does one get for that price? If not, what's the more accurate price for Oracle in the wild?
  • by lal ( 29527 )
    We are read-only ~10 million row database in MySQL on a dual PIII 800 w/1GB RAM. Indexed lookups are fast (sub second). MySQL was created as a data warehousing database, and when you use it for its original design purpose, it is fast.

    We found that the limiting factor is the number of indexes on each table. All indexes for each table are stored in one file. As you add indexes, reindex performance degrades; we're guessing that the cause is b-tree rebalancing. Anyway, we addressed that issue by creating and indexing sub tables. It's a query-only application, so we can afford to do things like that.
  • Maximum size of database: 32TB using multiple files; largest recorded InterBase database in production is over 200GB

    Maximum size of one file: 4GB on most platforms; 2GB on some platforms

    Maximum number of tables: 64K Tables

    Maximum size of one table: 32TB

    Maximum number of rows per table: 4GB Rows

    Maximum row size: 64K

    Maximum number of columns per table: Depends on the datatypes you use. (Example: 16,384 INTEGER (4 byte) values per row.)

    Maximum number of indexes per table: 64KB indexes

    Maximum number of indexes per database: 4G indexes

    And don't forget, Interbase/Firebird is absolutely free and open source!
  • 1: It's not the record count, it's the data size. 2: Quite a bit depends on the underlying OS, and what it can and cannot do. 3: Quite a bit relies on what you're doing; a database optimized for transactions will have different high water marks than a database optimized for querying. 4: That having been said, the Big Four (Oracle, Sybase, Informix, MS SQL Server) can pretty much handle what you throw at them, so long as you can shell for the hardware.
  • In my personal experience, no. When I was doing developer support for a web app development platform, I never once saw DB2 in use. Somebody else could probably tell you more about it than I. :-)
  • Microsoft actually has an accurate price comparison [microsoft.com] of Oracle, DB2 and SQL server. Granted, this is before you negoatiate "corporate discounts", but in an 8-way 1GHz setup, Oracle costs $800,000 compared to DB2 at $180,800 and SQL at $159,992.
  • since no one has mentioned it I thought I'd dig up a few links I remember reading.

    From the PostgreSQL FAQ, linked above:

    • Maximum size for a database? unlimited (60GB databases exist)
    • Maximum size for a table? 16 TB
    • Maximum size for a row? unlimited in 7.1 and later
    • Maximum size for a field? 1GB in 7.1 and later
    • Maximum number of rows in a table? unlimited
    • Maximum number of columns in a table? 250-1600 depending on column types
    • Maximum number of indexes on a table? unlimited

    If you couldn't tell I like postgres but as a business you should get what you think is best. From what I've heard Oracle on Solaris is where it's at if you can afford it. (I can't)

    Leknor

  • I have practical experience with Oracle on Solaris (while I was at Borland and SST), with Access on Windows NT (while I was at Actuate), with CSV flat-file databases on Solaris, and with MySQL on Linux (at SST). In addition, the employee I found thanks to the /. thread on hiring, has experience with MySQL on Linux as well.

    From what I can tell, MySQL is fast, fast, fast for certain limited uses, and can handle databases into the hundreds of thousands of records, although I have not pushed it into millions yet. At somewhere around 250,000 database calls every day, the thing starts to have problems, like the server needs a reboot. I don't know if the same problem happens if those calls are spread out over a long period of time. Throwing more iron at it, or reworking the indexes may help. We haven't upgraded to 3.23 stable yet, though, so that may solve the problem. In addition, we've begun using the "p" functions for databases -- you know, the "persistent" database connections. Use "mysql_pconnect" instead of "mysql_connect" for (sometimes) better response.

    One of the nice backups for MySQL (and few people seem to use this, but I like it a lot) is that it can write out every SQL update/insert/delete statement to a file. In the case of database corruption, you can take that file, go though the last few statments, and try to correct it. Then regen the ENTIRE database from that SQL file. In addition, 3.23 has something new -- replication or mirroring, something where it duplicates the database onto backup disks. I haven't used that yet. Perhaps someone reading this could share more.

    As for Oracle, yes, it can handle just about anything. I don't like the speed on older versions of the database. 8i is good, but actually not as fast as MySQL for some selects. I haven't tried 9. But it is a workhorse, it is storing hundreds of tables and millions of records for us. My current employer is trying to move away from it, because of the cost issue you cited. One of the things I like about Oracle is triggers, which MySQL doesn't have yet.

    I know that the worst database I used was back at Borland in 1996 -- I don't remember what the database was called, it was part of a Lotus Domino system. In any case, with 100,000 member records in the database, it took a good 20 minutes for it to add a new record. It was miserable. We had to put big bold text on the "become a member" page that warned of the wait, and I had to write a JavaScript (onclick) for the submit button, that returned false on subsequent clicks, so that people didn't abort the procedure on accident. I hated that. I felt like we were this technology company that didn't understand Web technology.

    As an interesting aside, back then Borland had a product called IntraBuilder. It was one of the first Cold Fusion-like systems, but it's database system was just poor. That's partly why they named it as they did -- they needed to brand it as an intranet tool, because they knew it wasn't fast enough for a full Internet deployment. But once the product was out there, the pressure was on to use it on Borland's site. We tried, Chris Malatesta and I. But I recall showing the IntraBuilder team some pages it served up, and how there was a full-second delay per user. Their response? "That seems acceptable to us." Chris and I tried to talk to them about scalability, about how the seconds accumulate and build up. They had no idea how hammered the product would be on a "real" Internet Web site. The product died, thankfully, when we ran a test with dozens of users, and database corruption was rampant -- one person's password appearing on another person's browser, and so on. Ugh. I don't miss those early days.

  • what's the more accurate price for Oracle in the wild?

    You can go to their web site and price it for yourself. As others have mentioned, their pricing-scheme-of-the-month involves adding up the speed of the processors that will be running it and multiplying by some magic numbers.

    The last Oracle installation I shepherded came to about $2000/year. That was for 8i on a single-processor, 600Mhz machine. In fact, that's enough power to handle a fair amount of traffic as back-end to a web site that doesn't do anything hugely complicated. In a case like that, throw money at hardware: fast multi-channel storage in particular.

    The trick comes when you want to scale too much beyond that. To use multiple servers you need features of Oracle Enterprise Edition, which is about 5 times as much (per server).

    But the fact is, it may well be worth it. The software runs well, is pretty fast these days, can do everything under the sun, and it's the easiest serious database to find qualified staff for.

    On the other hand, it's not the sort of thing you install one morning and then start merrily using that afternoon, like MySQL. Factor in your costs that you'll either need to hire experienced people (recommended) or spend many weeks with piles of expensive books making false starts before you start getting productive return.

  • by SClitheroe ( 132403 ) on Friday February 02, 2001 @10:27PM (#459716) Homepage
    I think any large DB project should be treated as an infrastructure project...

    When you're dealing with clients who have (relatively) massive data requirements (ie., data requirements that outstrip the smaller DB services such as MySQL that can be hosted on cheap, commodity hardware), you've got to step back and take a second look at the requirements of their business.

    Before you start looking at particular DB packages, and their relative merits, you should really start by identifying the business requirements for the database. For example, what are the uptime and recoverability requirements? If your client requires 24x365 uptime, and also wants a hot standby site, you've got to start spec'ing out the system at the hardware and network level.

    Identify the infrastructure that will sustain the requirements of your client, and then begin the process of selecting a DB package that works well within the constraints of that environment. When dealing with larger clients, always keep in mind that you are fundamentally delivering a DB infrastructure, not simply a DB.

    A couple things you might want to consider:

    Availability - does the system have to be redundant? Does it have to support automatic failover between redundant server?

    Scalability - does the DB need to grow beyond the constraints of a single server? Can you forsee this data existing on a large array (ala EMC), and do all nodes of the DB complex need simultaneous access at some level?

    Middleware - If it's a large DB, possibly being accessed from multiple physical locations, does it look like a queued middleware (ala IBM MQ) is needed, or perhaps appropriate?

    Disaster recovery - If the client is a 24x7 shop, or even if they aren't, are you going to have a sufficient window to back this beast up? Are they going to ask you to architect an off-site recovery strategy? If so, does it have to be a "hot" standby site? If so, does your hardware/network infrastructure support dynamic mirroring over a WAN?

    Once you've hammered out these kind of questions (and there are lots more..), then you are in a position to start evaluating DB packages that will work well withing the infrastructure you've designed.

    As with any project, getting the requirements defined out front is going to save you a lot of time and agony, and is going to save your client a lot of money and aggravation.

Only God can make random selections.

Working...