Forgot your password?
typodupeerror
News

With XML, is the Time Right for Hierarchical DBs? 276

Posted by Cliff
from the digital-evolution-of-data-storage dept.
DullTrev asks: "The hierarchical database model existed before the far more familiar relational model. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element. However, we now live in a web world that demands quick access to a variety of data on a variety of platforms. XML is being used to facilitate this, and XML has, of course, a hierarchical structure." Do you think a hierarchical database would really be a better answer for storing XML data over the existing relational counterparts?

"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.

Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

This discussion has been archived. No new comments can be posted.

With XML, is the Time Right for Hierarchical DBs?

Comments Filter:
  • by jayant_techguy (441933) on Sunday November 18, 2001 @03:44PM (#2581582) Homepage
    I found this SQL queried XML database in PHP [f2s.com]. Seems very kewl.
  • Discussions (Score:3, Informative)

    by Lozzer (141543) on Sunday November 18, 2001 @03:53PM (#2581608) Journal

    There is lots said on this over at Database Debunkings [firstsql.com]

  • by Anonymous Coward on Sunday November 18, 2001 @03:58PM (#2581618)
    Yes, I think XML databases can be useful sometimes, as even though relational is faster and better developed in some cases native XML products have the capability to store any data, without prior setup. I know I'm using dbXML (http://www.dbxml.org) in a product of mine which allows 3rd parties to store arbitrary data associated with a user.

    Also, you get the full advantage of the XML technologies developed by the W3C and others - the ability to do a simple query, transform that data and then send to a web browser with very little coding involved is a great bonus.

    (i've forgotten my login, time to go create a new one i think)
  • All about databases (Score:1, Informative)

    by jayant_techguy (441933) on Sunday November 18, 2001 @04:02PM (#2581630) Homepage
    Extropia [extropia.com] has a detailed tutorial on databases of all types.XML:DB [xmldb.org] discusses the differences between object-oriented databases, hierarchical databases, and relational databases in detail. You may be interested in DBX [f2s.com] a DBMS that is written completely in PHP, and works using XML style text files as its native format.
  • Pros and Cons (Score:2, Informative)

    by Multispin (49784) on Sunday November 18, 2001 @04:27PM (#2581702)
    I work for a company that has been doing hierarchical DBMSs for years. The company is Applied Technical Systems [apptechsys.com]. We make a database engine called CCM.

    XML is a great way for exchanging data, but the term XML databases is very misleading. If the database engine actually stores data in native XML, it's going to be *very* slow. I think the point behind XML is that nobody should really have to care what your backend is as long as you can export reasonable XML. Note that I say reasonable XML. And XML export that simple encodes the rows and fields in a table to XML with <row> and <col> tags is NOT reasonable. It conveys no actual knowledge of the real structure of the data.

    Storing XML data in a relation DB can either be a very hard problem or a very easy one. Let me explain.You could look at some XML and define a DB schema for it, not too hard to do. Problem? It's not generic; a human has to re do it each time the XML structure changes. The alternative is to store it all in one big table and index the hell out of it. Problem? It's slow. At that point you aren't using any structure of the XML or the power of relational DBs.

    I'm a firm believer that efficient XML storage, querying and retrieval will require a hierarchical database. The problem is that there's several features (bugs IMHO) in XML (and XPath) that, in a way, are throwbacks to relational DBs. IDREFs and the notion of document order particularly bug me. I ran into these this summer when I was on a team trying to build a XPath and XQuery front end for CCM.

    We're gradually seeing the XML world change. Early XML documents were similar to the type mentioned above. They were flat. When you start adding depth the information inherent in the structure of the data becomes apparent. Another thing I'm glad to see the industry move away from is the notion that XML resides in files. Many (if not all) of the early XML parsers made this assumption. It was a pain in the ass to parse from some other source, like a buffer in memory.
  • Repeat after me ... (Score:5, Informative)

    by Serpent Mage (95312) on Sunday November 18, 2001 @04:32PM (#2581717)
    XML is not a magic bullet. Relational database won out over the Hierarchical model for a lot of reasons. For instance, there exists a number of integrity constraints with the Hierarchical model such as

    1) No record occurrences except root records can exist without being related to a parent record occurrence. This means that
    a) a child record cannot be inserted unless it is linked to a parent record.
    b) a child record may be deleted independently of its parent however, deletion of the parent record automatically results in the deletion of all its child and descendent records.
    c) the above rules do not apply to virtual child records and virtual parent records.

    2) If a child record has 2 or more parent records from the SAME record type, the child record must be duplicated once under each parent record.

    3) A child record having 2 or more parent records of DIFFERENT record types can do so only by having at most 1 real parent, with all the others represented as virtual parents. IMS limites the number of virtual parents to 1.

    In addition to these flaws, relational databases have had over a decade to become mature, optimized, and enterprise scalable. Harddrive partitioning for such databases as oracle work out perfectly with the cylinder, sector, and tracks of a hard drive to allow for the fastest read/write times as can be possible.

    Too often people see that XML "can" do so many things and decides that it should be the way things are done but XML is NOT a magic bullet and just because it has the potential to do something does not make it the best methodology for doing so.
  • Re:I don't think so. (Score:3, Informative)

    by brunson (91995) on Sunday November 18, 2001 @05:20PM (#2581841) Homepage
    This is a terrible example. You are trying to describe a scenario that requires a many to many relationship. The intermediary "joiner" or cross-reference table is only necessary if you have a need to keep both joined tables normalized, i.e. you want each distinct telephone number, as well as each person object, to be stored in the database only once.

    You've already given up the possibility of normalizing your phone numbers in the heirarchical model (my roomates home phone is the same as mine and it shows up in LDAP twice, once for me and once for him), so a simple many to one join to the telephone number table will allow you to list a home phone twice, once for each of us.

    Now, if the data you are modeling truely requires a many to many relationship (your model needs to handle the real world, you can't change the world to fit the limitations of your tools), you have no way of representing that information in a normalized fashion in a heirarchical model. The so called "kludge" of an x-ref table from the relational world is not even an option.

    The heirarchical model is so limited and simplistic that it can be implemented in a single, self-referential table in a relational database, and can even be queried in a recursive manner (oracle has had 'connect by prior' for dealing with these models since I started with the product 10 years ago).

    From my view as a mathematician, and not a computer programmer, the relational model is so much more robust and powerful than a heirarchical model it hardly warrants discussion.
  • persistence layer (Score:2, Informative)

    by budGibson (18631) on Sunday November 18, 2001 @05:26PM (#2581855)
    In design, the logical construction of the program and its data structures should be relatively independent of the physical implementation of said.

    Basically, as I read your question, you are using a logical design that is hierarchical (an object structure experessed in XML) and wondering if it would not make more sense to store it in a hierarchical database. Maybe.

    However, relational databases form the current state of the art and have been highly optimized such that any theoretical performance gains from better matching of logical structure to physical lay-out in the database are likely outweighed. More generally, by insisting on a match between logical and physical lay-out, you would potentially be limiting yourself to a specific physical implementation, one that may not provide good performance relative to others.

    A better solution to your problem might be something referred to as a persistence layer. This adds another layer of abstraction to your application, in the form of a mapping, between your logical design and your actual physical mode of storage. There now exist publically available free (as in beer, and in some cases open-source) tools that will automate this mapping. Generally, any performance hit from the abstraction should be made up in the speed of the superior physical implementation, and the freedom to switch later is also important.

    Two that exist for java are castor available from exolab [exolab.org] and a pilot implementation for Sun's emerging Java Data Objects standard (see http://java.sun.com [sun.com] for that tool).
  • by nsample (261457) <<nsample> <at> <stanford.edu>> on Sunday November 18, 2001 @06:00PM (#2581965) Homepage
    Anyone can explain to me what is suddenly so wrong about relational database with hierarchical indexing?

    Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.


    Forgive me for tooting my own horn on this one, but I believe that (for once on /.) there is a correct answer.

    I summarize the answer in a paper written for VLDB 2001 (www.vldb.org [vldb.org]). The paper presents joint work between Stanford, Berkeley, and RightOrder, Inc. It can be found online here [vldb.org] (in PDF).

    What we found is that relational systems, with appropriate indexes for XML data, give the advantages of both worlds. XML is a hierarchical representation in only the loosest sense. It's written linearly in a flat text document, just as a child learns to write things down on a piece of paper. However, you wouldn't convince anyone but that same child that something written on paper can only represent two-dimensional objects just because the paper itself is flat. XML in many variants is plainly richer in concept than its simple hierarchical representation and thus quite suited to ER. I believe a previous poster mention RDF... a perfect example.

    Punchline: XML is neat, XML is tasty, but XML is not inherently more or less expressive than ER; it just requires a little critical thinking (and index tweaking) to tune ER engines to deal with it. (Once tuned, the ER engines dominate all others in performance.)
  • Re:XML Data Bloat (Score:3, Informative)

    by JordanH (75307) on Sunday November 18, 2001 @06:03PM (#2581977) Homepage Journal
    • Human readable?

      I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.


    The XML isn't human readable, but browsers and other applications can make pretty good guesses at a nice human readable representation.

    Further, you can define style sheets to produce different views, with data that would be unimportant to a particular human (or application) elided.

    It may be oversold, but the point is that the data definition is well defined such that writers and readers (often human readers, also applications) can interact more easily. It's about portability of data, which readability is a subset.
  • Re:Both Worlds (Score:2, Informative)

    by rp (29053) on Sunday November 18, 2001 @06:13PM (#2582009)
    You can represent the structure, but you can't manipulate it using standard relational logic.

    For example, take a table representing a parent-child relationship. Now try to sort the persons in the table by their number of descendants. SQL has only recently been extended to allow this query to be posed. Perhaps your relational database can handle this kind of query, where you have arbitrary-depth path walking, ybut ou can't expect it to handle them efficiently.
  • LDAP is a *protocol* (Score:2, Informative)

    by bigbird (40392) on Sunday November 18, 2001 @07:05PM (#2582179) Homepage
    Contrary to some of the comments I've read here, LDAP isn't an implementation of a database, it is a *protocol* for accessing directories. LDAP data could be stored in anything - a hierarchical database, a relational database, an object database or a flat file. Let's not confuse the issue under discussion.
  • by wytcld (179112) on Sunday November 18, 2001 @08:29PM (#2582484) Homepage
    99+% of all corporate data that isn't in a flat-file or (possibly three-dimensional) spreadsheat is in relational tables. The typical task that XML has been designed for is to standardize data exchanges between differently-structured relational systems, by providing sets of tags specific to the standards of specific industries. The whole point of XML is to enable companies to continue to use their current investment in relational databases, without the drag of having to do custom data conversions when dealing with suppliers or distant divisions in the company.

    If you're going to throw out the installed investment in relational databases, you might as well just design a common database standard per industry (rather than an XML data exchange standard) and let them all exchange native data rather than translating in and out of any exchange format. Obviously that won't happen.

    Now, if you're a new firm, you might decide it's easier to go OO or heirarchical or keep your data in slips of paper in a shoe box. But most of the available tools and solutions will continue to respect that relational works real, real well for inventory, manufacturing, accounts ... just about everything industry consists of. So if there's an impedence mismatch between relational and XML that's enough to make trouble, it's XML that should be replaced by another model.

    What design changes would be required to produce XML's relational equivalent?
  • Re:Reversed Question (Score:3, Informative)

    by Zeinfeld (263942) on Sunday November 18, 2001 @08:44PM (#2582536) Homepage
    Java, LDAP and XML were created to solve particular problems - at which they have succeeded quite well. SOAP and .NET were created purely to try and grab market share away from the previous technologies

    That is a crock. XML was developed explicitly to fix the problems in SGML. LDAP was developed to fix the problems in X.500. In both cases it was the poor design of the predecessor that was being fixed.

    Henrick F-N was working on SOAP like ideas long before he joined Microsoft. Again all SOAP does is to fix known incompetence in CORBA. Gates devised .NET to solve two problems, first how to get a foothold in the enterprise space, second how to improve on C++ without the proprietary lock that Sun had imposed on Java.

  • Native XML Databases (Score:4, Informative)

    by idomeneo (196902) on Sunday November 18, 2001 @09:54PM (#2582709)
    I recently wrote an introduction to native XML databases [xml.com] article for xml.com. My main point there and it applies to this discussion too, is that native XML databases are a tool like any other. For some jobs they're right and for some they're not. I've been working on the technology in the form of dbXML [dbxml.org] for about a year and a half and in some cases it's great and in others it really stinks. It's all about the right tool for the job.

    It's easy to dismiss a new database technology as irrelevant because of the dominance of the RDBMS, but you should really learn more about it and when it is appropriate and when it's not. It's not going to replace relational, and isn't intended to. Here's a few links where you can learn more beyond what's available on Ronald Bourret's site mentioned in the original post.

    The XML:DB Initiative [xmldb.org]
    The dbXML Project (open source native XML database) [dbxml.org] Soon to become an Apache XML project named Xindice
    eXist (another open source native XML database) [sourceforge.net]

    My blog on the subject. [xmldatabases.org]
    Kimbro Staken

  • by pegacat (89763) on Sunday November 18, 2001 @10:04PM (#2582735) Homepage

    A bit surprised to hear that 'Hierarchical databases were blown away by relational versions' - since I'm pretty sure they've been paying my pay check for the last three years... :-)

    There are a large number of heirarchical databases out there. The big fellas are the X500 directories (X509 certs came out of this work). More common are X500's demented kid sisters, the LDAP directories ( rfc2251 [faqs.org]). The DNS system also fits the description 'heirarchical database'.

    As far as XML goes, there are people storing XML in directories - although they're still fussing about exactly how to do it. There are a bunch of people trying to come up with standards - check the directory services markup language people www.dsml.org [dsml.org].

    There are people trying to sell XML enable directories - Novell sells an XML directory, but most directories can be used to store XML (including our 'eTrust Directory').

    As a final quicky - when do you use a directory over an RDBMS? Directories are good for naturally heirarchical data with few cross connections. They are usually optimised for slow writes/fast reads. They are *very* good for distributed data (e.g. DNS, international organisations etc.). The X500 spec defines a very fine grained security model, which can also be useful. However, if your data is closely cross-linked with lots of relationships... well, use an RDBMS!

  • No magic bullet... (Score:1, Informative)

    by Anonymous Coward on Monday November 19, 2001 @12:50AM (#2583202)
    Like some other posters have pointed out, XML is no magic bullet. Sometimes I really don't get the hype. I really wonder whether CSV will become the next big thing -- comma separated values!

    Why store a database in XML when you could store it in some high performance binary layout that maps to a disk's layout?

    Sure, XML is good for human-readable data, just as HTML was in a more primitive way, but that doesn't make it very efficient for enterprise-level stuff, and especially the *storage* of enterprise-level data.

You can bring any calculator you like to the midterm, as long as it doesn't dim the lights when you turn it on. -- Hepler, Systems Design 182

Working...