Forgot your password?
typodupeerror
News

With XML, is the Time Right for Hierarchical DBs? 276

Posted by Cliff
from the digital-evolution-of-data-storage dept.
DullTrev asks: "The hierarchical database model existed before the far more familiar relational model. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element. However, we now live in a web world that demands quick access to a variety of data on a variety of platforms. XML is being used to facilitate this, and XML has, of course, a hierarchical structure." Do you think a hierarchical database would really be a better answer for storing XML data over the existing relational counterparts?

"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.

Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

This discussion has been archived. No new comments can be posted.

With XML, is the Time Right for Hierarchical DBs?

Comments Filter:
  • I don't think so. (Score:2, Insightful)

    by webprogrammer (518832) on Sunday November 18, 2001 @03:47PM (#2581584) Homepage
    Hierarchical databases won't take over because they're relational counterparts are already so well developed. A relational database can do everything a hierarchical one can, with few exceptions. Even if there is a slight gain to using a hierarchical system, there are much fewer solutions, and consequently the one's that do exist aren't as well developed, so implenting one is more difficult.
  • Reversed Question (Score:5, Insightful)

    by devnullkac (223246) on Sunday November 18, 2001 @03:47PM (#2581585) Homepage
    From a purist perspective, I suspect the question is actually reversed: we shouldn't be talking about "XML data" is if it was somehow the core representation. It's usual intent is as a transmission format and, as such, needn't correspond directly to the organization of the source data.

    Rather than discard the advantages of relational and object databases, should we instead ask how XML can be used to represent those kinds of relationships?

  • by disarray (108) on Sunday November 18, 2001 @03:52PM (#2581603)
    Wouldn't object-oriented databases qualify as hierarchical (or some of them, at least)? A rather lengthy story [slashdot.org] ran a while back covering various reasons why object-oriented databases are useful, followed by various comments on cases where they aren't and why they aren't as common as relational ones today. The bottom line seems to be that they are in use today. One notable example comes to mind: LDAP. The aforementioned story has more. Despite the rather preachy tone, it's an interesting read.

    1337ness for sale. [ebay.com]

  • by hodeleri (89647) <drbrain@segment7.net> on Sunday November 18, 2001 @03:52PM (#2581604) Homepage Journal
    XML has, of course, a hierarchical structure

    Just because XML is a hierarchical markup language does not mean that it can only be used for hierarchical things. Perhaps you should look at RDF [w3.org] which can use many to many mappings through resources and groupings (sequences, bags, and alternates). (A resource in one grouping can refer to another grouping i.e. many to many.)

  • by Netmonger (3253) on Sunday November 18, 2001 @03:53PM (#2581605) Homepage
    I don't agree - look at LDAP. The benefits for LDAP'fying services is clear. With a hierarchial database, specific queries can target a subset of the entire database, without the over head of having seperate tables and/or database for varying information. For keeping track of 'real world' objects: People, Printers, IPS, etc.. the advantage is that the system used to organize them is similar to the actual grouping going on. Managers have employees 'underneath' them. Its basically taking the organizational concepts used for filesystems and applying them to database design. I havent done any performance testing LDAP vs. SQL for similar schema setup, but from what I understand one of the other benefits is fast lookups. Sounds like a good project! To implement databases in both LDAP and SQL and measure the performance of similar queries!! :)
  • XML vs. ERwin (Score:3, Insightful)

    by imrdkl (302224) on Sunday November 18, 2001 @03:53PM (#2581609) Homepage Journal
    IANADG, but the folks that do our models still use good, old, ERwin. Something about the relationship-specification capabilities, I guess. I was not aware that XML limited number of parents specifically. You sure that ain't just a limitation of your programming language? :)

    An afterthought, databases are about storage and speed of insertion/extraction. I honestly don't believe that fitting the database to the data structure is worth the cost or the trouble, just yet.

  • No Chance... (Score:3, Insightful)

    by augustz (18082) on Sunday November 18, 2001 @03:54PM (#2581610) Homepage
    I think these discussions come up part of the time because people want something new and sexy. In this case OO DB's, which 'XML DB's' are a variant of, may have benefits in specific and limited cases. But I have not been impressed.

    Take your classic orders table. Part NO, Custoemr NO, etc. etc. The number of apps with only one parent is tiny, the flexibilty limited, and the whole metadata scanning business awkaward.

    For anyone doing and serious larger scale database work some of this stuff is a joke. The idea these vendors have is that we'll be storing XML data in these DB's, ignoring that even for a simple phone directory, the XML data probably takes up a significantly greater amount of space than a simple relational DB would require

    And this ignores the significant amount of time and energy invested in toolsets and models for the existing setup. Sure, someone might come out with a chip that runs 2x as fast as an intel at the same price, but unless it is intel compatible how many people would buy it or care?
  • Indexing? (Score:4, Insightful)

    by aralin (107264) on Sunday November 18, 2001 @04:00PM (#2581625)
    Anyone can explain to me what is suddenly so wrong about relational database with hierarchical indexing?

    Maybe its just me, but the goal today is integration and having a special database for XML and special database for this and that just because its faster for this particular problem creates such a level of complexity, which prevents accomplishing even of the most trivial tasks.

    Still, XML is only a way how to describe data, that might be often in their structure relational. Why do not store data in their native form and create XML documents out of database on fly by filters?

    This question of hierarchical databases is just plain trolling in my eyes.

  • by ShmakDown (536071) <jim@cs.uoregon.edu> on Sunday November 18, 2001 @04:04PM (#2581635) Homepage
    I don't think that heirarchical db's have any real chance of taking over or replacing relational dbs in the future. There may start to be more of a place for them, but many application service providers that use XML still have a fair amount of relational data that needs to be maintained. XML is mainly being used for communication protocals and not so much for internal data structure storage. I think the more likely db trend in the future will be for many users to maintain both relational and heirarchical databases..
  • by coyote-san (38515) on Sunday November 18, 2001 @04:14PM (#2581660)
    Relational databases didn't come to dominate the database market because they pushed aside equally valid alternatives, they dominate the market because relational databases implement relational calculus. Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.

    And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.

    For hierarchial databases to really take off, they need to have an equally strong mathematical underpinning. For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties. That's a good start, but if you're only using the properties in relational databases, why not stick with them?

    As for XML, that's completely irrelevant. It's a good format for transferring data, but that's about it. You can store hierarchial data in an XML file, but you can also use it to store purely relational data or completely unstructured data (in some CDATA block).
  • XML Data Bloat (Score:2, Insightful)

    by trp0 (155951) on Sunday November 18, 2001 @04:21PM (#2581685) Homepage Journal
    It certainly seems like the same thing is happening with XML that happens with any new toy: "my friend told me XML was cool for stuff, so I'm going to convert everything to XML so I can be cool too."

    I was pretty sure that XML was useful in that it was a human-readable data-encoding mechanism that "average" users could get a grip on and utilize in sharing information between heterogenous systems, but it seems like people are completely missing the point these days in how to use XML effectively.

    A lot of the benefit of using XML is quickly becoming negated by everyone coming up with their own DTDs and the lack of standard formats for encoding data that is to be shared. As an example, here at the university I attend, there is a project for sharing information about biological species' population data amongst sister organizations. The goal is make the information possessed by all these organizations available to all the others. The trouble is that they have all come up with their own format for storing the data they collect and can not agree on what standard should be used, so each organization is encoding all their information with a different XML labeling scheme. My first questions was: "Why in the heck are you using XML to encode the data anyway?" Seems easier and saner to just store it in your relational database and make the database accessible to sister organization who can then encode the information however they want for their end-users through their client applications rather than the organization holding the information imposing order on people wanting access to the information.

    To make a long story short, XML encoding doesn't help you store the information more efficiently at all and with the state of the "formatting standards" today doesn't even really provide an efficient way of sharing information between organization or an efficient way of encoding the information for transmittal to other organizations. It seems as if people are missing the forest for the trees in how XML can be useful in its relation to data encoding and we should stick with our trusty ole relational and object-oriented database models as they have shown their usefulness and efficiency.
  • by Florian Weimer (88405) <fw@deneb.enyo.de> on Sunday November 18, 2001 @04:34PM (#2581723) Homepage
    You have a point. In addition, we should ask ourselves: "Do we really need XML if it doesn't fit in our established technology framework?"

    Often, the answer is a plain "No", from a technical standpoint. However, you have to market your product somehow, and this means that you need Java, Linux, LDAP, XML, and SOAP. (As time passes, some entries will drop off the beginning of this list, and others will show up at the end.)
  • Re:XML Data Bloat (Score:3, Insightful)

    by Skapare (16644) on Sunday November 18, 2001 @05:30PM (#2581864) Homepage
    XML is just a formatted, human-readable export file.

    Human readable?

    I suppose you don't mind it when someone send you mail, and you see a bunch of tags all over the place because it's in HTML. XML is just the same kind of thing ... all cluttered with tags. The computer can read XML easier and more quickly than humans. Sure it could read it even faster if it didn't have to parse all those tags. But I wouldn't call this a design intended for humans to read.

  • by drodver (410899) on Sunday November 18, 2001 @05:43PM (#2581891)
    Why do you assume relational databases are more developed than hierarchical?? The company I work for has been using our own hierarchical database for 25 years. They had the potential to become what Oracle is today but decided to stay focused on the medical industry. The serious problem with relational databases is they have traditionally not handled sparse data well at all. In the case of a patient every time they come for a visit there are tens of thousands of possible data points that can be entered, but most usually are empty. For tasks such as these relational databases have been completely impractical. With the use of indexing a heirarchical database can do everything a relational database can do.
  • Re:Indexing? (Score:2, Insightful)

    by captredballs (71364) on Sunday November 18, 2001 @05:52PM (#2581922) Homepage

    The problems that you mention, both concerning storage space and flexibility of the data model are what XML databases are attempting to solve.

    Listing the problems in opposition to the solutions does not make for a good arguement.
  • by Dasein (6110) <[tedc] [at] [codebig.com]> on Sunday November 18, 2001 @06:44PM (#2582116) Homepage Journal
    Okay, I've worked for two different network model database companies -- the network database model is just an extension of the network model to allow graph schemas instead of a strict hierarchy. I've also worked with two companies that we mapping hierarchical structures onto relational databases.

    You can think of data structures as (leaving ternary relationships and such aside) some sort of network of relationships. When you think of it this way, relational and network model databases have more similarities than they have differences, especially when you consider that using surrogate keys is the moral equivalent of a network model "pointer".

    Okay so you have this network of relationships, mapping a hierarchical structure onto that is simply picking a starting point and traversing the structure from that "viewpoint" without visiting a node via the same relationship twice (simplified algorithm but...) One of these groups used to think about this like you had a multi-legged turkey. You grab one leg and hold it up. All the other legs hang down -- you grab another leg and a different set of legs hang down.

    So, if you buy that, does it really make sense to represent any sort of network of information in a hierarchical form? Well, yes and no. It makes sense from a presentation and maybe interchange perspective but not from a native storage perspective. It's simply to constrictive and you and up representing relationships that don't fit into a neat hierarchy programmatically in the application code instead of explicitly in the database schema. 25 years from now, someone is trying to reverse engineer your code and figure out how all this data is related -- blech. Ever wonder why IMS application are generally left alone and newer applications are not usually written to IMS. This is part of the reason why. (yes there are some but they are the exception).

    Throw in to this my experience working with a bank that had hierarchical data and the extent to which they went to circumvent that restriction, and I'd say that native hierarchical storage for XML is a bad idea. Granted it's tempting but it seems ill advised since it's very likely that your data will survive long beyond the lifecycle of the system used to originally store it.

    <RANT>
    The original question didn't provoke this but I've seen a couple of responses about using XML as a native data storage format. Let me say that, unless the data is very static, it's a monumentally stupid idea to do that. XML is not a replacement for a database.

    I find that most of the people who really want to do this are ignorant of all the work that goes into real database systems. They don't understand lock management, transactions, rollback and recovery, free space management nor the scalability issue that real databases take care of under the covers. If you feel tempted read this [amazon.com]

    You throw this plus the representation of non-hierarchical relationship with IDs and sooner or later you will find yourself in a text editor tracking down ID/IDREF pairs to find out where your data is corrupted. Or writing scripts to validate your "entire data set" -- above a few megabytes it can be really painful.

    For God's sake, expect to use XML to store data that you are going to update with any regularity.
    </RANT>
  • by dgroskind (198819) on Sunday November 18, 2001 @07:00PM (#2582169)

    XML may be hierarchical but the data it is used to markup is not necessarily hierarchical. For instance, XML can be used to markup conventional fielded (flat file) data to serve as an interchange format.

    More importantly, XML is used to impose some structure on inherently unstructured text. The structure it provides is based on some assumptions of how the data will be used or how it will be presented. If the data is used in some otherway, the markup can be useless.

    An example is a book. For XML purposes, it can be described as structured by chapter, section, subsection, and paragraph. For information purposes, tags are assigned to represent the ideas, terminology, names and other index-like content. There is virtually no structure in these index type of tags but they convey the most important information in the book.

    Or not. These tags are assigned based on assumptions about what readers are interested in. A different set of assumptions would produce a different set of tags even thought the structure of the document would stay the same. If the sentences and paragraphs are shuffled and exerpted for some other publication, even the structure becomes irrelevant.

    How this inherently unstructured information is stored is relevant to how it is managed, that is, how it is backed up, how access is controled, how changes are tracked. However, when it comes to putting the information to some useful purpose, it is the retrieval mechanisms that are important. The issues here are how easily the user can specify the type of information he wants and how accurately the mechanism can find it. This process is usually independent of the underlying structure and uses some higher level concepts of relevance and context.

    The question of whether to use a hierarchical, relational or object-oriented data structures misses the point for textual data, for which XML is commonly used, because none of these structures capture meaning.

    Topic maps [topicmaps.org] make a heroic stab at capturing meaning in XML markup but still only within a set of assumption. I suspect a true meaning markup language is theoretically impossible, or at least theoretically very far in the future.

  • by mj6798 (514047) on Sunday November 18, 2001 @07:07PM (#2582183)
    everything looks like a nail. The relational model is pretty good for its original purpose: allowing non-specialists quick access to large amounts of statistical and business data (sales records, etc.) via an easy-to-learn query language. But for many other applications, it has proven to be completely insufficient.

    Indeed, that's the very touchstone that distinguishes relational databases from something like DBM and its many descendants.

    The alternative to relational databases is not "DBM", it is object oriented, tree structured, logical, and other kinds of database models. Those are just as well defined as relational databases.

    And *that* is important because it assures the desiger and user that every possible operation is well-defined and (hopefully) correctly implemented. The exact syntax for a "join" may differ, and a specific implementation may be flawed, but everyone agrees to a common baseline.

    Relational databases provide a common baseline for a primitive set of relational operations. Real-world implementations of those models have been augmented by zillions of operations that weren't part of the original relational model and that often don't even fit into the relational model. And without those extra operations, relational databases would not be useful in practice.

    For now, AFAIK, there is none other than that you get when you map a hierarchial database into relational tables and use exactly those relational properties.

    Are you kidding? It is a major pain trying to express hierarchical data in a relational database model: the relations that describe hierarchical data and the operations that you might want to execute often require complex, multiple, inefficient queries and updates, and the relational model provides few tools to ensure that the corresponding relations remain consistent.

    The semantics of tree structures are trivial to define. People do it in programming language classes all the time. And it is trivial to formulate a database model corresponding to it. In fact, if you have an object-oriented database that respects language semantics, you get hierarchical databases automatically when you define an abstract tree datatype.

    Still, so-called "relational" databases will continue to dominate the market for a long time to come. That's not because the relational model is particularly well-suited to a lot of applications. In part, that's because "relational databases" are not purely relational anymore: they generally include numerous facilities for object-oriented and hierarchical databases, under a "relational veneer". They even include the old "navigational" database systems, combined with the widespread use of stored procedures that do whatever they want whenever they want it on the database server.

    In different words, traditionally relational databases will provide increasingly better support for hierarchical and object-oriented data, but they will continue to also support the relational model, as well as relational access to these other data types. And newly developed databases with other kinds of data models will provide an SQL or other relational frontend to their content. And marketing will continue to include "something-relational" in all the advertising because otherwise the old database hands won't buy it.

  • by Antti R (94512) on Sunday November 18, 2001 @09:05PM (#2582594)
    Gates devised .NET

    Isn't it just lovely to develop for a platform where the motivation for every development is a commercial plot to maximize platform controller's profit margin?

    [...] second how to improve on C++ without the proprietary lock that Sun had imposed on Java.

    More like, how to get a proprietary grip on language and a platform like Sun has with Java.
    And no, rubber-stamping some of the interfaces designed solely by you (to best fit into win32, of course) at ECMA while leaving the thinnest win32 wrappers (like the gui classes) merely de-facto standards, does not make C#/.NET non-proprietary.
  • by Zeinfeld (263942) on Sunday November 18, 2001 @09:39PM (#2582670) Homepage
    Relational databases didn't come to dominate the database market because they pushed aside equally valid alternatives, they dominate the market because relational databases implement relational calculus.

    That's rubbish. Back in in the 1960s when the first relational databases emerged nobody had a formal specification for a relational calculus. Today we can create a formal calculus for any data model, the Entity relational model is no different in that regard.

    SQL is a very 1960s / COBOL way of looking at a data structure. Most of the people using it simply do not have the breadth of experience of other data models to know its strengths or weaknesses. Most of the posts in the thread are as empty as those in an editor choice flamewar.

    The entity relationship model has been discarded by the programming language community in favor of typed set theory. Java and C# both have representations of sets, lists, etc., the only reason to use an entity relational model is to get persistence for the data structure.

    So you get this impedance mismatch and a pile of code whose sole purpose is to rewrite the data structures used in the program so that they match the data structures used in the persistence store.

    What we need is a persistence store with a data model that matches our programming language data model. Unfortunately most of the attempts to do this are half baked. All it should take is to add transaction statements into the language so that you declare a procedure to be transactional, it will be all or nothing.

    Unfortunately Sun made a pact with Oracle over Java and so they have remained stuck in the obsolete SQL world. C# looks to me to be a much better opportunity, Microsoft has little to lose from unifying the data model of the language with that of the persistence store and everything to gain.

  • by tim_maroney (239442) on Sunday November 18, 2001 @11:13PM (#2582959) Homepage
    So you get this impedance mismatch and a pile of code whose sole purpose is to rewrite the data structures used in the program so that they match the data structures used in the persistence store.

    Exactly. What's more, this pile of code takes months to write even for a few dozen object types; it doesn't understand the idea of dependencies between objects so you have to add a whole layer to make sure that objects get persisted in the right order; it's incredibly hard to change, so the system design can't iterate; and simple objects like collections proliferate tables to the point of significant performance losses. It's a terrible way to build a software system unless the user model just happens to be adequately modeled by a fill-in-the-blanks table.

    This is why serious applications traditionally roll their own file formats. It's actually less work to manage most data models from scratch than it is to map them into the straitjacket of a relational database. Custom file formats serve in essence as hand-rolled object databases. Unfortunately, the rise of the three-tier client-server architecture has made the RDBMS layer an unquestioned assumption, with the result that modeling two dozen object types winds up generating over 50,000 lines of convoluted, slow and buggy source code. Modeling the same objects from scratch on a custom B-tree would take less than one fifth the code size. Doing it in a good ODBMS would be almost as trivial as specifying the data structures in XML.

    On my latest project, we ran into a strange issue when specifying the user interface of a discussion system. The designers wanted to mark read and unread messages per user -- in other words, functionality critical to providing a friendly user experience, which rn had fifteen years ago. The engineers hit the roof and said it was impossible. It turned out the reason was that this is an intrinsically hard problem on an RDBMS, although it's a trivial problem to solve in a hand-rolled .newsrc text file. Over the course of the project we ran into tons of these issues, and the interface design took a severe beating because of compromises to the limitations of an RDBMS back-end.

    Tim

Parkinson's Law: Work expands to fill the time alloted it.

Working...