With XML, is the Time Right for Hierarchical DBs? 276
"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.
Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
SQL queried XML database in PHP (Score:2, Informative)
Discussions (Score:3, Informative)
There is lots said on this over at Database Debunkings [firstsql.com]
XML is sometimes useful (Score:1, Informative)
Also, you get the full advantage of the XML technologies developed by the W3C and others - the ability to do a simple query, transform that data and then send to a web browser with very little coding involved is a great bonus.
(i've forgotten my login, time to go create a new one i think)
All about databases (Score:1, Informative)
Pros and Cons (Score:2, Informative)
XML is a great way for exchanging data, but the term XML databases is very misleading. If the database engine actually stores data in native XML, it's going to be *very* slow. I think the point behind XML is that nobody should really have to care what your backend is as long as you can export reasonable XML. Note that I say reasonable XML. And XML export that simple encodes the rows and fields in a table to XML with <row> and <col> tags is NOT reasonable. It conveys no actual knowledge of the real structure of the data.
Storing XML data in a relation DB can either be a very hard problem or a very easy one. Let me explain.You could look at some XML and define a DB schema for it, not too hard to do. Problem? It's not generic; a human has to re do it each time the XML structure changes. The alternative is to store it all in one big table and index the hell out of it. Problem? It's slow. At that point you aren't using any structure of the XML or the power of relational DBs.
I'm a firm believer that efficient XML storage, querying and retrieval will require a hierarchical database. The problem is that there's several features (bugs IMHO) in XML (and XPath) that, in a way, are throwbacks to relational DBs. IDREFs and the notion of document order particularly bug me. I ran into these this summer when I was on a team trying to build a XPath and XQuery front end for CCM.
We're gradually seeing the XML world change. Early XML documents were similar to the type mentioned above. They were flat. When you start adding depth the information inherent in the structure of the data becomes apparent. Another thing I'm glad to see the industry move away from is the notion that XML resides in files. Many (if not all) of the early XML parsers made this assumption. It was a pain in the ass to parse from some other source, like a buffer in memory.
Repeat after me ... (Score:5, Informative)
1) No record occurrences except root records can exist without being related to a parent record occurrence. This means that
a) a child record cannot be inserted unless it is linked to a parent record.
b) a child record may be deleted independently of its parent however, deletion of the parent record automatically results in the deletion of all its child and descendent records.
c) the above rules do not apply to virtual child records and virtual parent records.
2) If a child record has 2 or more parent records from the SAME record type, the child record must be duplicated once under each parent record.
3) A child record having 2 or more parent records of DIFFERENT record types can do so only by having at most 1 real parent, with all the others represented as virtual parents. IMS limites the number of virtual parents to 1.
In addition to these flaws, relational databases have had over a decade to become mature, optimized, and enterprise scalable. Harddrive partitioning for such databases as oracle work out perfectly with the cylinder, sector, and tracks of a hard drive to allow for the fastest read/write times as can be possible.
Too often people see that XML "can" do so many things and decides that it should be the way things are done but XML is NOT a magic bullet and just because it has the potential to do something does not make it the best methodology for doing so.
Re:I don't think so. (Score:3, Informative)
You've already given up the possibility of normalizing your phone numbers in the heirarchical model (my roomates home phone is the same as mine and it shows up in LDAP twice, once for me and once for him), so a simple many to one join to the telephone number table will allow you to list a home phone twice, once for each of us.
Now, if the data you are modeling truely requires a many to many relationship (your model needs to handle the real world, you can't change the world to fit the limitations of your tools), you have no way of representing that information in a normalized fashion in a heirarchical model. The so called "kludge" of an x-ref table from the relational world is not even an option.
The heirarchical model is so limited and simplistic that it can be implemented in a single, self-referential table in a relational database, and can even be queried in a recursive manner (oracle has had 'connect by prior' for dealing with these models since I started with the product 10 years ago).
From my view as a mathematician, and not a computer programmer, the relational model is so much more robust and powerful than a heirarchical model it hardly warrants discussion.
persistence layer (Score:2, Informative)
Basically, as I read your question, you are using a logical design that is hierarchical (an object structure experessed in XML) and wondering if it would not make more sense to store it in a hierarchical database. Maybe.
However, relational databases form the current state of the art and have been highly optimized such that any theoretical performance gains from better matching of logical structure to physical lay-out in the database are likely outweighed. More generally, by insisting on a match between logical and physical lay-out, you would potentially be limiting yourself to a specific physical implementation, one that may not provide good performance relative to others.
A better solution to your problem might be something referred to as a persistence layer. This adds another layer of abstraction to your application, in the form of a mapping, between your logical design and your actual physical mode of storage. There now exist publically available free (as in beer, and in some cases open-source) tools that will automate this mapping. Generally, any performance hit from the abstraction should be made up in the speed of the superior physical implementation, and the freedom to switch later is also important.
Two that exist for java are castor available from exolab [exolab.org] and a pilot implementation for Sun's emerging Java Data Objects standard (see http://java.sun.com [sun.com] for that tool).
Experience with XML over ER engines (Score:3, Informative)
Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.
Forgive me for tooting my own horn on this one, but I believe that (for once on
I summarize the answer in a paper written for VLDB 2001 (www.vldb.org [vldb.org]). The paper presents joint work between Stanford, Berkeley, and RightOrder, Inc. It can be found online here [vldb.org] (in PDF).
What we found is that relational systems, with appropriate indexes for XML data, give the advantages of both worlds. XML is a hierarchical representation in only the loosest sense. It's written linearly in a flat text document, just as a child learns to write things down on a piece of paper. However, you wouldn't convince anyone but that same child that something written on paper can only represent two-dimensional objects just because the paper itself is flat. XML in many variants is plainly richer in concept than its simple hierarchical representation and thus quite suited to ER. I believe a previous poster mention RDF... a perfect example.
Punchline: XML is neat, XML is tasty, but XML is not inherently more or less expressive than ER; it just requires a little critical thinking (and index tweaking) to tune ER engines to deal with it. (Once tuned, the ER engines dominate all others in performance.)
Re:XML Data Bloat (Score:3, Informative)
I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.
The XML isn't human readable, but browsers and other applications can make pretty good guesses at a nice human readable representation.
Further, you can define style sheets to produce different views, with data that would be unimportant to a particular human (or application) elided.
It may be oversold, but the point is that the data definition is well defined such that writers and readers (often human readers, also applications) can interact more easily. It's about portability of data, which readability is a subset.
Re:Both Worlds (Score:2, Informative)
For example, take a table representing a parent-child relationship. Now try to sort the persons in the table by their number of descendants. SQL has only recently been extended to allow this query to be posed. Perhaps your relational database can handle this kind of query, where you have arbitrary-depth path walking, ybut ou can't expect it to handle them efficiently.
LDAP is a *protocol* (Score:2, Informative)
if so, then XML is the wrong solution (Score:4, Informative)
If you're going to throw out the installed investment in relational databases, you might as well just design a common database standard per industry (rather than an XML data exchange standard) and let them all exchange native data rather than translating in and out of any exchange format. Obviously that won't happen.
Now, if you're a new firm, you might decide it's easier to go OO or heirarchical or keep your data in slips of paper in a shoe box. But most of the available tools and solutions will continue to respect that relational works real, real well for inventory, manufacturing, accounts
What design changes would be required to produce XML's relational equivalent?
Re:Reversed Question (Score:3, Informative)
That is a crock. XML was developed explicitly to fix the problems in SGML. LDAP was developed to fix the problems in X.500. In both cases it was the poor design of the predecessor that was being fixed.
Henrick F-N was working on SOAP like ideas long before he joined Microsoft. Again all SOAP does is to fix known incompetence in CORBA. Gates devised .NET to solve two problems, first how to get a foothold in the enterprise space, second how to improve on C++ without the proprietary lock that Sun had imposed on Java.
Native XML Databases (Score:4, Informative)
It's easy to dismiss a new database technology as irrelevant because of the dominance of the RDBMS, but you should really learn more about it and when it is appropriate and when it's not. It's not going to replace relational, and isn't intended to. Here's a few links where you can learn more beyond what's available on Ronald Bourret's site mentioned in the original post.
The XML:DB Initiative [xmldb.org]
The dbXML Project (open source native XML database) [dbxml.org] Soon to become an Apache XML project named Xindice
eXist (another open source native XML database) [sourceforge.net]
My blog on the subject. [xmldatabases.org]
Kimbro Staken
Lots 'o Heirarchical Databases out there... (Score:4, Informative)
A bit surprised to hear that 'Hierarchical databases were blown away by relational versions' - since I'm pretty sure they've been paying my pay check for the last three years... :-)
There are a large number of heirarchical databases out there. The big fellas are the X500 directories (X509 certs came out of this work). More common are X500's demented kid sisters, the LDAP directories ( rfc2251 [faqs.org]). The DNS system also fits the description 'heirarchical database'.
As far as XML goes, there are people storing XML in directories - although they're still fussing about exactly how to do it. There are a bunch of people trying to come up with standards - check the directory services markup language people www.dsml.org [dsml.org].
There are people trying to sell XML enable directories - Novell sells an XML directory, but most directories can be used to store XML (including our 'eTrust Directory').
As a final quicky - when do you use a directory over an RDBMS? Directories are good for naturally heirarchical data with few cross connections. They are usually optimised for slow writes/fast reads. They are *very* good for distributed data (e.g. DNS, international organisations etc.). The X500 spec defines a very fine grained security model, which can also be useful. However, if your data is closely cross-linked with lots of relationships... well, use an RDBMS!
No magic bullet... (Score:1, Informative)
Why store a database in XML when you could store it in some high performance binary layout that maps to a disk's layout?
Sure, XML is good for human-readable data, just as HTML was in a more primitive way, but that doesn't make it very efficient for enterprise-level stuff, and especially the *storage* of enterprise-level data.