Forgot your password?
typodupeerror
News

With XML, is the Time Right for Hierarchical DBs? 276

Posted by Cliff
from the digital-evolution-of-data-storage dept.
DullTrev asks: "The hierarchical database model existed before the far more familiar relational model. Hierarchical databases were blown away by relational versions because it was difficult to model a many-to-many relationship - the very basis of the hierarchical model is that each child element has only one parent element. However, we now live in a web world that demands quick access to a variety of data on a variety of platforms. XML is being used to facilitate this, and XML has, of course, a hierarchical structure." Do you think a hierarchical database would really be a better answer for storing XML data over the existing relational counterparts?

"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.

Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

This discussion has been archived. No new comments can be posted.

With XML, is the Time Right for Hierarchical DBs?

Comments Filter:
  • by Anonymous Coward on Sunday November 18, 2001 @03:48PM (#2581589)
    excelon has a very full featured XML database.
    We use it exclusively and it kicks ass.

    Well, the current version does. Pre 3.0 sucked ass.

    http://www.exceloncorp.com
  • by russcoon (34224) on Sunday November 18, 2001 @04:03PM (#2581632) Homepage
    In my experience with XML and RDBMS systems, mapping one onto another is always a dicey task. The primary reason (IMHO) is that XML's ability to represent order as well as structure as data doesn't fit into an RDBMS database without some work. I've seen people try to map both XML and regular DB's onto each other, and my opinion is that the results don't "feel right" on one side or the other unless great pains are made to preserve the structure of the XML doc in the DB schema.

    That said, I'm not sure a hierarchial DB will necessarialy be any better than something like an OODBMS with well-modeled objects.
  • by el_mex (175423) on Sunday November 18, 2001 @04:13PM (#2581657)
    A data format will NEVER dictate a system's design. XML is nothing other than a data format.


    The relational model has no major shortcomings. The only thing XML offers that is not already very well done is easier data interchange. As a database administrator, I can tell you there is NO chance XML will dictate a change of how we store data. There are much higher priorities in database management than easier data interchange.

  • by Carnage4Life (106069) on Sunday November 18, 2001 @04:16PM (#2581671) Homepage Journal
    Hi,
    I wrote a paper on native XML databases and SQL databases that support XML [25hoursaday.com] that appeared on Slashdot [slashdot.org] a little while ago. While doing research for that paper I asked myself the same question, whether instead of coming up with hybrid methods to store relational and hierarchical data we should store XML in already existing hierarchical databases. Unfortunately things are not so clear cut.

    First of all, a lot of data out there is relational and people aren't ready or willing to transition all that data to XML based storage so mixing of relational and XML data will probably be with us for a while. The biggest problem with object oriented databases is that they didn't understand this fundamental issue but it seems that with XMKL databases the vendors understand that hybrid data will be with us for quite a while which is why Tamino supports importing data from relational sources and even ships with a SQL engine.

    Secondly, XML documents have a lot of metadata beyond the hierarchical parent-child relationships such as processing instructions, comments and entities which are require more intelligence in the support from the database than just storing parent-child relationships.

    Finally all the major [commercial] relational database vendors have included some sort of native suppport for XML including XML types and there is a an ANSI standard in the works [sqlx.org] for combining XML and SQL. From what I've seen, none of the hierarchical databases plan to support XML as much as the relational databases have or plan to.

    Now if you were simply asking whether a native XML database can be built on top of a hierarchical database then I believe the answer is yes. Then again native XML databases can and have been built on object oriented databases and relational databses so it makes sense that they can be implemented in a database system that is more suited to handling hierarchical data.
  • by bwt (68845) on Sunday November 18, 2001 @04:24PM (#2581693) Homepage
    Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"

    I believe that RDBMS's should add functionality to read/write XML, especially as the XML Schema recommendations is basically done.

    The idea that XML should be the permanent storage format is a bad one. There is a lot of power in a normalized data model -- it enforces data integrity , while eliminating data fragmentation automatically and it minimizes transaction resources.

    Consider XML representations for different entities that all share some kind of child entity. For example: people, businesses, and schools all share addresses. In XML, you want the addresses to appear in the description of the individual object. Does that mean you want to store the addresses separately that way? Absolutely not, because then when you enforce constraints or ask questions about addresses, your data is fragmented in three places. For that matter, how do you know all the entities that might use addresses? In an RDBMS, you can inspect all the foreign keys to the address entitity. What's the XML analog?
  • Some thoughts... (Score:5, Interesting)

    by Coventry (3779) on Sunday November 18, 2001 @05:10PM (#2581822) Journal
    I have been struggling with these issues for awhile now, for various reasons. Why? Because I like Zope [zope.org], but am, like most developers, more comfortable with relational data structures.

    Zope uses an object database known as the ZODB. Some forms of many-to-many relationsships and such can be handled via the use of selection and multi-selection properties, which are designed to distinguish between a selected element and the list of available elements. The list of elements can be derived from a property on the current object, a property on a parent object, or be created via a method call - allowing for non-traditional (for OODBMS) cross-linking of objects. Of course, since this sort of thing is a workaround, no true relational links are created... 'Soft Relations' may be ok for MySQL [mysql.org], but in big application development, relationships must be enforced! Thus, the big-boys in RDBMS all enforce foreign keys (mysql does not)...

    Of course, I've found that by careful creation of object heirarcies, very complex applications can be created on top of a OODBMS that are in fact more robust, in some ways, then the relational couterparts. The Bigest hurdle (Short-term) I see to OODBMS (including ones based upon XML [the ZODB can export objects as XML but they are stored differently internally]) is the lack of a true query and data manipulation language - like SQL. Sure, OQL exists, and is even technically a standard, but it A) sucks and B) is geared towards large java applications with huge amounts of active objects, not general purpose OODB queries. Thus, without such language, OODBMS are all disimilar in how one queries and creates/updates data, and in many cases, the only interface is a truely procedural one! Thus OODBMS are forced to use proprietary tools, and are locked into one system - not to mention speed of development (something normally associated with OO development and OODBMS in general) is hindered by the excessive amount of procedural calls one needs to simply query thier data...

    Recently, an add-on to Zope addressed some of these issues. Called 'ZOQL' - it uses a SQL like syntax and allows for very discrete querying of the ZODB (something one had to do programatically using the 'ZCatalog' before) with all of the familar aggregate and comparison operators SQL users love... Of course, this _still_ doesn't address the issue of soft-relationships:

    I think the bigest hurdle to OODBMS in the long term (tools like ZOQL are interfaces to existing systems, thus can be mplemented easily) is the lack of handling relationships. It seems that most RDBMS force a developer to think in Relational terms about the data, and most OODBMS force you to think in terms of objects... Most problems can be mapped to either of these domains, but you are forcing the data-model-type onto the problem. What is needed is a hybrid system, an 'Object-Relational' DBMS. This is to say that OODBMS system makers desist with the traditional OO idea that relations are of the following types:
    • Object A is a Object B
    • Object A Has a/many Object B(s)
    What RDBMS systems excelled in (and thus fell into pupular use for) was ease of management and allowing common data to be moved and grouped. A 'Look-up Table' - for instance, which simply holds a list of common data (an enumerated list) and can be centrally maintained is a Boon in the RDBMS world. For example, you have a lookup table of car manufactureres, and one of them changes its name... Instead of updating all N Cars that are made by the manufacturer, you simply update the single record in lookup table. Since each car would have somehting akin to a 'Manuafactuer_ID' column linking it to the lookup table, the Cars belonging to the manufacturer are all taken care of.

    How does one do this in a hierarchal system? Well, the easy answer would be that each manufacturer object contains all the cars that manufacturer makes. Simple, right? WRONG. Why?

    Because each car also has a body-type (compact, sedan, SUV, truck, van, etc...) - which in a relational database would simple by another lookup table, but in an OODBMS poses data management issues. Do we put body-type higher then manufacturer? If so, then we have to maintain the list of manufacturers for each body type, causing headaches. Or do we put body-type below manufacturer, causing us to need to maintain a seperate list of body types for each manufacturer - these lists of course need to match exactly if we ever plan on being able to search or do reports based upon all cars of a specific body type.
    Sadly enough, this sort of seperate-enumeration-relationship isn't implemented (well) in any OODBMS I've found.
    Take the ZOBD for example, its selection and multiselection lists Try to handle this situation, but fail because relational integrety is not maintained! That is to say, behind the scenes it's not a true reference to a value in the enumerated list, but just a text entry representing a value in the list. If the value in the list changes, the selection-property does not update, leaving you with the equivilent of MySQL's bastard-children, the orphaned records.
    This sort of soft-relationship handling is Ugly and BAD for maintainaility, but OODBMS users are faced with two ugly choices each time they map such a relationship: Do I store this as a plain-text property and just update N records each time this changes, or do I map it into the hierarchy and deal with the headaches incurred by doing so...?

    I don't think I've answered the question, but hopefully I've at least shed some light on the subject for members of both the OODBMS camps and RDBMS camps... Now if only a useful ORDBMS were to come along...

    (Note that PostgreSQL and some other RDBMS actualy can be used in a semi-OO manner, but this is usually reserved for inheritable structures of data to be used for specific extensions to the data model - thus the SUV table inherits from the Cars table and adds some columns - but all other relationships SUV has will still be relational)
  • A Hierarchy of Myth (Score:2, Interesting)

    by droleary (47999) on Sunday November 18, 2001 @05:15PM (#2581831) Homepage
    While a hierarchy is often used by humans to organize and structure things, that should in no way impact how the data/information/objects are treated as individuals. Look at the common file system hierarchy and it's easy to see that burying files under a hierarchy of directories actually makes access to that information harder. It wasn't so noticeable when we were all just managing a few MB of files, but now people are beginning to store large picture, movie, and sound libraries. File managers have mistakenly stuck with the hierarchy instead of using information associated with the file itself (ID3 tags, etc.) to organize it all. What is really needed is a better approach to representing metadata so that information can be accessed directly based on those metadata attributes and not have it hidden in the hierarchy. I have a short essay on this from the work I've been doing on a Meta Object Manager (MOM), but it needs to be cleaned up before it could be published.

    The desire to impose a hierarchy on the data itself instead of considering a hierarchy as simply one view on the data is a step backwards. Nobody who manages large amounts of data is looking to jam it into a static hierarchy, and so XML is not an answer, nor is any hierarchical representation.
  • Re:I don't think so. (Score:2, Interesting)

    by jd142 (129673) on Sunday November 18, 2001 @05:19PM (#2581837) Homepage
    Nah, this example is pretty simple. You don't even have to use a join or bridge table, which is what I've heard them called. Those are only needed when you have two objects that have a many to many relationship. For example, if you were doing a database of computer repairs, you might have a table of customers and a table of techs. Since there would be a many to many relationship here, you'd have a work order table or something, to show that tech1 worked with cust1, cust2, cust1, cust3, cust3, and that cust1 had service calls by tech1, tech2, tech2, tech1, etc.

    In this case, unless you had a table of phone number data that contained information about the number (like who paid for it, the day it was installed, the type of service available, the type of line, etc) you could get by with just one employee/number table, like this:

    bobid phone1
    bobid phone2
    bobid phone3

    which is pretty simple, with a combinatio key of employid/phonenumb. You could still have a separate table with the phone number info, with the phone number as the primary key if you wanted to track the other data.

    Most people overthink relational databases and don't really break things down like they should and make well formed tables. Of course, you can chang ethe table structure based on how the database is going to be used. Sometimes is is better to denormalize the table for search efficiency.

    What I think is most interesting are the OODBMS, but it seems to me that they would have an increased overhead on their searches.

    bob
  • by ccf (116263) on Sunday November 18, 2001 @05:42PM (#2581884) Homepage
    The enXyme [sourceforge.net] project attempts to map XML data onto a relational database schema. The goal is to allow complex, specific queries of XML data. It's not easy to capture ALL of XML, with all its possibilities, but you can do parts of it. The project already has a basic XML schema parser, a script that takes an XML schema and generates a series of sql CREATE statements that reflect the hierarchy described by the schema.

    I guess a pure XML database like the ones mentioned in the article would be better at this, but the advantage is that relational dbs are already in wide use.
  • See RDF (Score:2, Interesting)

    by shirro (17185) on Sunday November 18, 2001 @09:41PM (#2582680) Homepage

    I don't think XML by itself carries enough metadata to understand much beyond whether a document is valid or not. I think RDF and RDFS have a big role to play in getting XML database ready.


    Perhaps hopping on the XML database bandwagon before RDF technologies mature could be a mistake. Forget the semantic web, I want to see the sematic database.



    W3 RDF [w3.org]


    A Good RDF resource [bris.ac.uk]

  • Not so simple (Score:3, Interesting)

    by drodver (410899) on Sunday November 18, 2001 @11:21PM (#2582980)
    The problem with your scheme is that as the list of information grows the amount of time to do the access grows linearly. With systems where there are thousands of concurrent users all doing large numbers of data accessing of the information your solution would grind the system to a halt. While technically possible to do this with a relational database it is impractical. Taking into account that 90% of the databases use is doing reads and the access of such a system would be even worse. Also take into account that the system has to be able to handle some data which does not change and some data that has to be grouped based apon a visit date. Your system would have no efficient way to group certain elements by date.
  • Re:Some thoughts... (Score:2, Interesting)

    by maxm (20632) on Monday November 19, 2001 @06:48AM (#2583940) Homepage
    It could be solved easily enough I think, and I am currently writing a module that I belive would solve most of the problems I have when using Zope.

    All that is needed is a relation "product".

    relations.add([obj1, obj2], [obj3, obj4, obj42])
    relations.getRelations(obj1)
    >>>[obj3, obj4, obj42]
    relations.getRelations(obj3)
    >>>[obj1, obj2]

    Every object in zope is defined by its id, and it's path, so it could be done relatively easily.

    Then you would get the advantages of a relational model in the ZODB.

    You could even use a different instance of the class for different object types. Like you make many relation tables in a traditinal rdbm.

Luck, that's when preparation and opportunity meet. -- P.E. Trudeau

Working...