Good Database Design Books? 291
OneC0de writes "I am the Director of IT for a small/medium sized marketing company, where I personally write the code that runs our applications. We use a variety of technology at our office, the majority of which rely on MS-SQL and MySQL databases. I am familiar with tables, SQL queries, and have a general understanding of how the SQL databases work. What I'm looking for is a good book, particularly a newer book, to explain general database design techniques, and maybe explain some relational tables. We have some tables that have million of rows, and I'd like to know the best method of designing these tables."
This is (Score:1, Informative)
Don't be afraid of older books.. database theory really hasen't evolved that much.
For a good general overview you might check out "Database Design for Mere Mortals".
I'd also recommend you find a few books on specific areas
You'll probably want a book on normalization (preferably one not written by a normalization fanboy... there are times when de-normalized designs make sense) for sure as this speaks almost directly to the goal you described.
You'll probably want one on SQL tuning as well.. and one on modeling and documentation / diagramming.
Database in Depth (Score:5, Informative)
Database in Depth: Relational Theory for Practitioners
Publisher: O'Reilly Media; 1 edition (May 1, 2005)
Language: English
ISBN-10: 0596100124
ISBN-13: 978-0596100124
Best DB book i have ever owned/read/seen!
O'Reilly (Score:4, Informative)
O'Reilly books are your friend. The "... in a Nutshell" books are a good place to start, and then proceed into the more advanced books. They have 25 titles related to MySQL [oreilly.com] and 53 titles related to Microsoft SQL [oreilly.com]. There are usually a few to browse through at the large chain book stores.
Re:Somewhere, a coder is polishing his resume (Score:5, Informative)
Re:Somewhere, a coder is polishing his resume (Score:3, Informative)
Why assume that there are any coders that OP manages? He's "Director of IT" for a "small/medium" company that isn't a software (or even technology) company. It's quite possible that OP manages, if anyone, a handful of desktop support technicians that aren't programmers.
In fact, I would hope that something like that is the case, as that's really the only explanation for a Director of IT that, as OP describes, personally "writes the code" (note: not "writes some of the code") for a company's applications, since otherwise he is managing coders that don't actually write any code, which would be unimaginably wasteful.
Certainly, I've known of small companies in non-computing fields where the "Director of IT" was also the whole IT department.
Re:A Few Suggestions (Score:5, Informative)
We have some tables that have million of rows, and I'd like to know the best method of designing these tables.
I'm a developer, not a database expert. But it seems that every now and then I have to get my hands dirty with data modeling. "The best method" is probably a really vague concept. If you have serious hardware constraints than the best method changes from an easily maintainable system to something more complex. There's give and take in database design and I guess a million rows is really something that a traditional relational database should be able to handle. So I'd suggest any book that teaches data modeling will suit you here.
eldavojohn makes some excellent points and gives some great suggestions. Keep in mind, like elda suggests, nothing is cut and dry. Configuration, resources, numbers of connections for specific data, etc; all will have an impact (or should) on what you should do and how you should design.
Re:Somewhere, a coder is polishing his resume (Score:3, Informative)
I would say you are being a little paranoid. There is such a thing as a good boss, you know. I find that these are the guys who are still heavily involved in some sort of 'research'. Which is probably what he/she is doing. Probably a smart cookie, does some coding but by no means all of it. Knows enough to recognise a good text to buy for his group so they can all learn together.
I put it to you that I'd prefer to work with this guy than with your paranoid self. Do you have meetings of the secret type?
Take a university class (Score:3, Informative)
I'm not sure I'd trust a book to teach this subject as comprehensively as a good university course on the subject. Frequently, you can sit a class quite inexpensively if you're not going for credit.
For that matter, isn't MIT or someone allowing free not-for-credit access to their eLearning materials?
Re:A Few Suggestions (Score:5, Informative)
I used this book at Foothill college in an intro to data management class and it taught me more than any of the dozen oracle classes I took once I got past the terminology of tuples, etc
this one is also well-recommended:
http://www.amazon.com/Database-Systems-Design-Implementation-Management/dp/0760049041
and this one is good for people without dba or architect background:
http://www.amazon.com/Database-Design-Mere-Mortals-Hands/dp/0201752840/ref=sr_1_1?ie=UTF8&s=books&qid=1278629171&sr=1-1
I would stay away from the vendor specific books as good database design s/b dbms agnostic
-I'm just sayin'
Good SQL design books: (Score:5, Informative)
IMHO: Joe Celko's SQL for Smarties (http://www.amazon.com/Joe-Celkos-SQL-Smarties-Programming/dp/0123693799/ref=sr_1_2?ie=UTF8&s=books) has shown itself to be very nice book when the need to go beyond the basics to a little deeper understanding of SQL is needed.
There are many other books on the subject all the way to source material from Date and Dodd but Celko seems to be well informed and writes fairly well, I think.
OMG (Score:3, Informative)
If you are designing anything bigger than a couple of gigabytes, you are in for some fun (or your users are). ;-)
To be a good designer, there is no substitute for a thorough understanding of the subject matter. And you are a self-confessed n00b. Get an expert. Or study. Hard.
Database in Depth: Relational Theory for Practitioners [amazon.com].
Do you know relational algebra? (Score:4, Informative)
Do you know relational algebra? If you don't, then I highly recommend:
Codd, E.F. (1990). The Relational Model for Database Management (Version 2 ed.). Addison Wesley Publishing Company. ISBN 0-201-14192-2.
It's MUCH better to know the fundamentals of database systems and then try to figure out details than vice-versa.http://ask.slashdot.org/story/10/07/08/2142211/Good-Database-Design-Books?art_pos=1#
Normalized vs Denormalized (Score:1, Informative)
Depends on what you want to do with your database. You have two broad options:
Normalized database:
Application developers prefer this because you'll design your database where every discrete list has its own table. The main benefits are performance and maintainability. For example, if you are tracking a list of marketing promotions and they each have a status of "Started" or "Finished", the statuses "Started" and "Finished" would be in one table and the table holding the promotions would have foreign key relationships to it. Your developers could then, instead of using string matching for "Started", filter on the foreign key of 1 (or whatever the integer key is). There's a big performance boost there. Also, you'll be able to rename and/or add to those statuses without affecting the underlying data.
De-normalized database (data warehouse):
If you are talking about 1 million+ rows you're probably going to be interested in doing analytics and reporting. The data warehouse (look for Kimball: The datawarehouse toolkit) is designed for report writers and analysts in mind. For a marketing example, perhaps you have a Promotions "Dimension" where all the attributes about a certain promotion are described (region, name, type, current status, client, etc) and one or more "Fact" tables that describe the metrics you want to measure about Promotions. For example, length of promotion, units sold, etc. This type of database structure makes it easy for people who are not SQL experts to explore and analyze the data. Data warehouses are usually produced out of an ETL (extract, transfer, load) process that copies data for a normalized database (usually because that is what the application is using).
Hope that helps.
Text Book (Score:3, Informative)
Would you like to reduce your development time?.. (Score:4, Informative)
...and improve your quality and maintainability?
Back in the 70's and early 80's we learned a methodology called, "Data Structured Systems Design" and the fundamental presupposition was that everything could be expressed logically and accurately by describing it as relationships in set theory. I have not seen anything since that surpasses the quality and maintainability of database applications and systems.
Someone already mentioned Joe Celko's book "SQL for Smarties" and I would recommend you first read his, "Thinking in Sets" before any of his other books.
I would also suggest some earlier books by Ken Orr and Jean Dominique Warnier. If you learn the Warnier-Orr approach to DESIGNING the system before doing any coding, you will reduce the time necessary for maintaining the system. I have seen hundreds of small IT shops like yours, and much of the time Systems Analysis and Design is neglected and performed "off-the-cuff" by programmers who can't wait to get to the coding. I didn't originally believe Ken Orr's assertion that spending twice as much time designing the system would result in a sharp time reduction for overall project completion, but through experience and observation I became a believer.
Re:Somewhere, a coder is polishing his resume (Score:3, Informative)
Three practical lessons (Score:5, Informative)
These three lessons may not all be in any one book, but they can help in the real world:
1) Learn what SQL Injection is and how to defend against it. It will ruin your day and could severely damage your current employment situation.
2) Abstract your schema from your front-end applications. Stored procedures are easy to write and can provide security and if well written stop injection attacks. They will let you change your database design without breaking your deployed apps. Just update the internal code in the P. Middleware and objects can do this, too.
3) Bergstrom's law of sailing says: "You can get away with anything in less than 5 knots of wind." Similarly, any little box or blade with 2 to 4 gs of RAM can easily handle 5 to 10 million row tables. Dedicate the server to MySQL or MS SQL so they can cache and buffer efficiently and they will outperform much bigger boxes trying to run too many schemas and DBs concurrently. Learn to index. Don't be too puritanical about normalization. Returning a customer address should require 6 joins. And remember that moving that moving large recordsets across the LANWAN may take much more time than the server query.
You probably already know all this... but maybe someone else reading this doesn't.
Re:Somewhere, a coder is polishing his resume (Score:3, Informative)
In a small a Director is usually someone who sits on the board.
In a large company like an S&P500 one, a director is usually a management position with responsibility for a specific business area.
The titles are the same but the meaning different. I'm assuming from the size of business the poster described (50 employees) that he is in the former category of Director.
http://en.wikipedia.org/wiki/Corporate_title [wikipedia.org] describes both types of director.
Re:Somewhere, a coder is polishing his resume (Score:3, Informative)
Actually, that's rarely the case. Even as Director, or even VP, you usually can't just say "I want to hire someone", and then go do it. Does your budget allow for hiring another employee? Would another employee on staff change the company position for taxes, insurance, or regulatory concerns?
A decision such as that usually goes up to the COO or CEO (depending on the company structure). Upon tentative approval, it would go to accounting to ensure the budget is available to sustain the prospective employee, and then over to human resources.
It can be that a Director or VP already has the authorization to add employees, which simply means it's already gone through the other steps, and then he or she can hire as needed. It would be very reasonable to believe that a Director or VP would have authorization to hire X employees as needed.
Maybe your company works in such a loose manner that the brass can hire and fire at will, but a well run organization will actually plan for such changes.
The Folks from ErWin (Score:3, Informative)
Back in the day when they were their own company they used to recommend
Designing Quality Databases with IDEF1X Information Models
I found the book VERY informative
Re:Somewhere, a coder is polishing his resume (Score:4, Informative)
Our CIO has a programming background and once fixed some database code we were having problems with. This is a 10,000 person organization with an IT staff of around 300. It's not hard to imagine a small company where the IT director takes on some programming tasks.
Re:modeling is even more important (Score:3, Informative)
In the modern days of cheap disk, big disk caches, and large ram, proper modelling is more important than strict normalization.
Back when those books were written, disk was expensive and not cached, RAM was very expensive, and machines had terrible I/O bottlenecks.. Normalization is critical under these circumstances for maximum performance.
Today, these normalization techniques will increase performance but not as much as you might think. Really it is best to concentrate efforts elsewhere, especially for a one-person shop.
All of that normalization work requires coding changes and it will undoubtedly make the code much less readable and maintainable.
<facepalm/> Performance? I wasn't even thinking of that as a reason to understand normalization. I'm thinking data integrity at least in the conceptual model.
The value of normalization is not so much in performance but in considering and planning what a decent data/information model should look like.
You normalize your *model*, your blueprint, and de-normalize as you see fit with the resources available. The actual tables and tablespaces might not (will not) look 1-to-1 to the model, but you still have a model of information.
And when you do deviate from your model, you do so consciously; you know exactly where you are deviating from the model; and you know why. Without a model, or worse, with a badly crapped model, you don't know what you have.
It is good to exploit the hardware capabilities we have now, BUT without at least having a conceptual understanding of normalization, this is almost always a sure way to get into a corner where the only option is to throw more hardware to the problem.
The distinction here now is that people deploy hardware strategically, but because they have no choice: shit won't run without. Hardware is cheap. Operational costs are not. Understanding normalization is to (relational) data modeling and building what modularity and structure are to OO design (and software building in general.)
Normalization is not about performance (even if its immediate effects in the past were performance related.) It is about reduction of unnecessary data redundancy that compromises data integrity.
Not performance. Data Integrity.
Even with the hardware that we have today, I still have to see a well-designed model that does not in great part enforce the 2nd normal form and most attributes of 1st normal form (in particular about avoiding duplication of rows and maintaining regular columns.)
One of my favorite books (Score:2, Informative)
Database Modeling and Design: Logical Design (Score:3, Informative)
Database Modeling and Design: Logical Design, 4th Edition. Its ISBN is 0126853525. It taught me a lot about how databases work "under the hood". If you want to know the performance implications of a b+ tree index vs. a b-tree, this book will help.
Re:Can you be more precise ? (Score:3, Informative)
Exactly! I want to know, that if the business continues to boom for the next five years, my software won't fall apart, because of bad database design.
In that case you want to always be on the look out for scalability and maintainability topics. Today's million row database might need to be tomorrow's 50 million row database, and you may have to change engines to something more performance oriented.
Something you need to learn is not simply how to model your data, but how to access it through an abstraction layer (such as views or stored procedures) that will allow you to replace the database engine without rewriting the software calling it.
This practice can also help you keep you safe as learn and grow. If you always access your data through the use of stored procedures, you effectively hide your database table schema from your application. So if tomorrow you figure out that by extracting Zip Codes to their own table you can save on postage, you can rearrange the data in the tables, yet still call the old stored procedure SP_GET_CUSTOMER_ADDRESS as long as you keep the signature of the stored procedure the same. You'll be able to reap the benefits of postage savings with no application code changes required.
Other things that aren't always taught include practical data access from within your code. My favorite is seeing something like "select * from MY_TABLE where KEY_VAL=2". This is a trap when it's embedded in your code or your SQL. The asterisk will always return all data columns in schema order. If you try to rearrange the order of columns in your schema (perhaps your modeling method or modeling tool encourages keeping primary key fields in sequential order, and you add a new key field, for example) you will break any code or stored procedures that use the "select *" query. "Select *" is fine for browsing a database by hand, but it doesn't belong in the code as it introduces this hidden dependency.
You also need to learn how to develop a "versioning" convention for naming your stored procedures, so you can continually update them in a backward compatible manner. I haven't seen that kind of advice in the books I've read, but maybe I'm just not reading the right books.
You'll also want to understand the differences between your chosen flavor of SQL and some of the big players, such as Oracle and SQL Server. There are a few annoying syntactical differences in the languages that can make porting between different vendors' databases difficult, and you may be able to avoid future update problems by avoiding certain language features.
To solve some of these problems, consider accessing the data from your application through a mapping library such as iBatis or LINQ. That way you're not writing the application portion of the SQL at all, and differences can often be resolved by updating a value in a configuration file.
Finally, you need to consider security. Unless the database is never actually used by anyone, it's very easy to write code that is vulnerable to SQL injection attacks.
I was there once, (Score:2, Informative)
the 3rd member of the staff, hired by a friend who was the second member of the staff. Eventually we wound up with nearly 2 dozen people, many better than me or my friend.
But even when I was Application Development manager, I designed table structures and wrote custom queries to reply to FOIA requests for data.
I took some graduate school classes after getting my BSCS, so as to have access to a computer while looking for my first job, which tells you something about when this was. The best class was Relational Data Base using "An Introduction to Database Systems" by C. J. Date. ISBN 0-201-14471-9.
Mr. Date, along with Mr. Codd, invented relational calculus, including normal forms. In later classes at work we were strongly advised to use 3rd normal form, as even mainframes of the day couldn't really support 4th or 5th. That instructor had participated in a project to rebuild a 5th normal form system into 3rd for Westinghouse, whose mainframe choked on the small (low column count) tables
and huge keys required by 5th normal form.
The book covers other styles of databases, network and hierarchical, but both are antique now. So I'd skip or at most skim those chapters. They show how Relational DB design grew out of experience with shortcomings of Multics and IMS, early network and hierarchical DBs, respectively.
Other commentors are correct, which DB software you use isn't terribly important for good table structure design. Learning how to select keys for uniqueness and design tables to be non-redundant are not database-specific solutions.
Do good backups, and practise restoring from them regularly, it doesn't matter how well-deswigned a DB is if the hardware fails and you can't recover the data.
Re:Database in Depth (Score:3, Informative)
I agree wholeheartedly that Database In Depth is one of the best DB books in print - but would recommend for this reader instead Date's slightly later book SQL and Relational Theory, which replaces the Tutorial D examples with SQL and goes more in depth into how to use SQL relationally.
Wow. (Score:2, Informative)
Basically, none of your comment is right.
Re:A Few Suggestions (Score:3, Informative)
I agree with the CASE Method ER book, Barker is the king of data modeling. In the book he walks through some real world scenarios (airline ticketing, manufacturing bill-of-materials) that are fundamental to relation databases.
You may find some implementation differences with SQLServer, like not using cursors (a common pl/sql construct in Oracle) and some limitation to using "join on" SQL syntax, but I used the book when I went from writing single user applications to enterprise apps.
Whatever book you get, expect it to be a tedious learning experience. I had been working with relational databases (Dbase IV, Infos) for about five years at the time and it took some serious re-ordering of my brain to really "get it"
Good Luck
Get a book to help you use the features of each (Score:2, Informative)
Re:Database in Depth (Score:3, Informative)
Software Trainers recommendations (Score:2, Informative)
The best handbook (Score:2, Informative)
Re:Database in Depth (Score:1, Informative)
Here's the link to the book in O'Reilly's catalog [oreilly.com]. It's 240 pages and priced at $29.95USD for print, $23.99 e-book (PDF). The author is C.J. Date.
Anti-patterns (Score:2, Informative)
Lots of good suggestions on how to learn what to do.
This is a good book to show you common ways you can get yourself in trouble and how to avoid them: http://pragprog.com/titles/bksqla/sql-antipatterns [pragprog.com]
J
Re:A Few Suggestions (Score:4, Informative)
Yeah, no one said that people can't strangle themselves and do foolish stuff. Big production apps need to be written by people that know what they're doing.
But your citations are at the edge of the curve. You're a black belt, and your stuff better run fast and cleanly or your creds are dirt. These are civilians. They learn, and hopefully know when it's time to get a pro into the equation before they hurt themselves.
Some won't, but the same can be said for car repair and even nuclear physics (viz the LHC forehead slappers).
Your kewl self knows this stuff cold. Let other people learn, even if they get hurt. If they have to pay to get stuff fixed, it's a risk that they likely knowingly take from the onset.
I wish there were real tools with real front ends that you could give to a civilian, knowing they couldn't hurt themselves. But like a chainsaw, you have to hope that when they fire things up, they know a little about what they're doing.