With XML, is the Time Right for Hierarchical DBs? 276
"There have been some pushes to create pure XML databases (info on XML in connection to databases is here and info on XML database products is here) with claims that as they support XML natively, they can offer many advantages over relation databases.
Some of these claims include speed, better handling of audio, graphic and other digital files, easier administration, and handling of unexpected elements. Software AG, a German firm, produce and sell a suite of XML products, including Tamino, a native XML database. They have lots of information on why they think there database is great, not surprisingly, but no benchmarks. So, do the Slashdot community think that with XML the time has come for hierarchical databases? Or is it better simply to use a relational database that can output in XML, or script your way to achieve the same goal?"
You didn't mention the best native XML DB (Score:1, Interesting)
We use it exclusively and it kicks ass.
Well, the current version does. Pre 3.0 sucked ass.
http://www.exceloncorp.com
XML and RDBMS inconsistencies (Score:2, Interesting)
That said, I'm not sure a hierarchial DB will necessarialy be any better than something like an OODBMS with well-modeled objects.
The priorities are wrong.... (Score:2, Interesting)
The relational model has no major shortcomings. The only thing XML offers that is not already very well done is easier data interchange. As a database administrator, I can tell you there is NO chance XML will dictate a change of how we store data. There are much higher priorities in database management than easier data interchange.
I'm currently working on a paper about this... (Score:3, Interesting)
I wrote a paper on native XML databases and SQL databases that support XML [25hoursaday.com] that appeared on Slashdot [slashdot.org] a little while ago. While doing research for that paper I asked myself the same question, whether instead of coming up with hybrid methods to store relational and hierarchical data we should store XML in already existing hierarchical databases. Unfortunately things are not so clear cut.
First of all, a lot of data out there is relational and people aren't ready or willing to transition all that data to XML based storage so mixing of relational and XML data will probably be with us for a while. The biggest problem with object oriented databases is that they didn't understand this fundamental issue but it seems that with XMKL databases the vendors understand that hybrid data will be with us for quite a while which is why Tamino supports importing data from relational sources and even ships with a SQL engine.
Secondly, XML documents have a lot of metadata beyond the hierarchical parent-child relationships such as processing instructions, comments and entities which are require more intelligence in the support from the database than just storing parent-child relationships.
Finally all the major [commercial] relational database vendors have included some sort of native suppport for XML including XML types and there is a an ANSI standard in the works [sqlx.org] for combining XML and SQL. From what I've seen, none of the hierarchical databases plan to support XML as much as the relational databases have or plan to.
Now if you were simply asking whether a native XML database can be built on top of a hierarchical database then I believe the answer is yes. Then again native XML databases can and have been built on object oriented databases and relational databses so it makes sense that they can be implemented in a database system that is more suited to handling hierarchical data.
XML not meant as a replacement for RDBMSs (Score:4, Interesting)
I believe that RDBMS's should add functionality to read/write XML, especially as the XML Schema recommendations is basically done.
The idea that XML should be the permanent storage format is a bad one. There is a lot of power in a normalized data model -- it enforces data integrity , while eliminating data fragmentation automatically and it minimizes transaction resources.
Consider XML representations for different entities that all share some kind of child entity. For example: people, businesses, and schools all share addresses. In XML, you want the addresses to appear in the description of the individual object. Does that mean you want to store the addresses separately that way? Absolutely not, because then when you enforce constraints or ask questions about addresses, your data is fragmented in three places. For that matter, how do you know all the entities that might use addresses? In an RDBMS, you can inspect all the foreign keys to the address entitity. What's the XML analog?
Some thoughts... (Score:5, Interesting)
Zope uses an object database known as the ZODB. Some forms of many-to-many relationsships and such can be handled via the use of selection and multi-selection properties, which are designed to distinguish between a selected element and the list of available elements. The list of elements can be derived from a property on the current object, a property on a parent object, or be created via a method call - allowing for non-traditional (for OODBMS) cross-linking of objects. Of course, since this sort of thing is a workaround, no true relational links are created... 'Soft Relations' may be ok for MySQL [mysql.org], but in big application development, relationships must be enforced! Thus, the big-boys in RDBMS all enforce foreign keys (mysql does not)...
Of course, I've found that by careful creation of object heirarcies, very complex applications can be created on top of a OODBMS that are in fact more robust, in some ways, then the relational couterparts. The Bigest hurdle (Short-term) I see to OODBMS (including ones based upon XML [the ZODB can export objects as XML but they are stored differently internally]) is the lack of a true query and data manipulation language - like SQL. Sure, OQL exists, and is even technically a standard, but it A) sucks and B) is geared towards large java applications with huge amounts of active objects, not general purpose OODB queries. Thus, without such language, OODBMS are all disimilar in how one queries and creates/updates data, and in many cases, the only interface is a truely procedural one! Thus OODBMS are forced to use proprietary tools, and are locked into one system - not to mention speed of development (something normally associated with OO development and OODBMS in general) is hindered by the excessive amount of procedural calls one needs to simply query thier data...
Recently, an add-on to Zope addressed some of these issues. Called 'ZOQL' - it uses a SQL like syntax and allows for very discrete querying of the ZODB (something one had to do programatically using the 'ZCatalog' before) with all of the familar aggregate and comparison operators SQL users love... Of course, this _still_ doesn't address the issue of soft-relationships:
I think the bigest hurdle to OODBMS in the long term (tools like ZOQL are interfaces to existing systems, thus can be mplemented easily) is the lack of handling relationships. It seems that most RDBMS force a developer to think in Relational terms about the data, and most OODBMS force you to think in terms of objects... Most problems can be mapped to either of these domains, but you are forcing the data-model-type onto the problem. What is needed is a hybrid system, an 'Object-Relational' DBMS. This is to say that OODBMS system makers desist with the traditional OO idea that relations are of the following types:
How does one do this in a hierarchal system? Well, the easy answer would be that each manufacturer object contains all the cars that manufacturer makes. Simple, right? WRONG. Why?
Because each car also has a body-type (compact, sedan, SUV, truck, van, etc...) - which in a relational database would simple by another lookup table, but in an OODBMS poses data management issues. Do we put body-type higher then manufacturer? If so, then we have to maintain the list of manufacturers for each body type, causing headaches. Or do we put body-type below manufacturer, causing us to need to maintain a seperate list of body types for each manufacturer - these lists of course need to match exactly if we ever plan on being able to search or do reports based upon all cars of a specific body type.
Sadly enough, this sort of seperate-enumeration-relationship isn't implemented (well) in any OODBMS I've found.
Take the ZOBD for example, its selection and multiselection lists Try to handle this situation, but fail because relational integrety is not maintained! That is to say, behind the scenes it's not a true reference to a value in the enumerated list, but just a text entry representing a value in the list. If the value in the list changes, the selection-property does not update, leaving you with the equivilent of MySQL's bastard-children, the orphaned records.
This sort of soft-relationship handling is Ugly and BAD for maintainaility, but OODBMS users are faced with two ugly choices each time they map such a relationship: Do I store this as a plain-text property and just update N records each time this changes, or do I map it into the hierarchy and deal with the headaches incurred by doing so...?
I don't think I've answered the question, but hopefully I've at least shed some light on the subject for members of both the OODBMS camps and RDBMS camps... Now if only a useful ORDBMS were to come along...
(Note that PostgreSQL and some other RDBMS actualy can be used in a semi-OO manner, but this is usually reserved for inheritable structures of data to be used for specific extensions to the data model - thus the SUV table inherits from the Cars table and adds some columns - but all other relationships SUV has will still be relational)
A Hierarchy of Myth (Score:2, Interesting)
The desire to impose a hierarchy on the data itself instead of considering a hierarchy as simply one view on the data is a step backwards. Nobody who manages large amounts of data is looking to jam it into a static hierarchy, and so XML is not an answer, nor is any hierarchical representation.
Re:I don't think so. (Score:2, Interesting)
In this case, unless you had a table of phone number data that contained information about the number (like who paid for it, the day it was installed, the type of service available, the type of line, etc) you could get by with just one employee/number table, like this:
bobid phone1
bobid phone2
bobid phone3
which is pretty simple, with a combinatio key of employid/phonenumb. You could still have a separate table with the phone number info, with the phone number as the primary key if you wanted to track the other data.
Most people overthink relational databases and don't really break things down like they should and make well formed tables. Of course, you can chang ethe table structure based on how the database is going to be used. Sometimes is is better to denormalize the table for search efficiency.
What I think is most interesting are the OODBMS, but it seems to me that they would have an increased overhead on their searches.
bob
Mapping XML onto a relational model (Score:2, Interesting)
I guess a pure XML database like the ones mentioned in the article would be better at this, but the advantage is that relational dbs are already in wide use.
See RDF (Score:2, Interesting)
I don't think XML by itself carries enough metadata to understand much beyond whether a document is valid or not. I think RDF and RDFS have a big role to play in getting XML database ready.
Perhaps hopping on the XML database bandwagon before RDF technologies mature could be a mistake. Forget the semantic web, I want to see the sematic database.
W3 RDF [w3.org]
A Good RDF resource [bris.ac.uk]
Not so simple (Score:3, Interesting)
Re:Some thoughts... (Score:2, Interesting)
All that is needed is a relation "product".
relations.add([obj1, obj2], [obj3, obj4, obj42])
relations.getRelations(obj1)
>>>[obj3, obj4, obj42]
relations.getRelations(obj3)
>>>[obj1, obj2]
Every object in zope is defined by its id, and it's path, so it could be done relatively easily.
Then you would get the advantages of a relational model in the ZODB.
You could even use a different instance of the class for different object types. Like you make many relation tables in a traditinal rdbm.