Slashdot Log In
Is Dedicated Hosting for Critical DTDs Necessary?
Posted by
Cliff
on Thu May 17, 2007 05:07 PM
from the might-the-W3C-be-interested dept.
from the might-the-W3C-be-interested dept.
pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"
Related Stories
[+]
Developers: Netscape Restores RSS DTD, Until July 134 comments
Randall Bennett writes "RSS 0.91's DTD has been restored to it's rightful location on my.netscape.com, but it'll only stay there till July 1st, 2007. Then, Netscape will remove the DTD, which is loaded four million times each day. Devs, start your caching engines."
This discussion has been archived.
No new comments can be posted.
Is Dedicated Hosting for Critical DTDs Necessary?
|
Log In/Create an Account
| Top
| 140 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

I know! (Score:5, Funny)
Mhahahahaha. Yeah. I know, I crack myself up.
Re:Catalog files? (Score:5, Insightful)
(http://www.mosehansen.dk/)
Is there some technical reason I'm not aware of that means it has to stay somewhere central?
There shouldn't be, yet I would be greatly surprised if some application didn't match on the entire DTD string, hostname and all.
I am equally baffled at what applications need the DTD for anyway. Except for generic XML applications, what use is a DTD? Most applications only handles a fixed few XML document types anyway.
Finally, if they really need that DTD... any distro have most major DTDs available. No reason why they couldn't carry a few extra. Should be easy to just search for them locally.
Re:I know! (Score:5, Funny)
(http://shockandblog.com/blog)
Mhahahahaha. Yeah. I know, I crack myself up.
Centralization (Score:5, Insightful)
Regards.
Re:Centralization (Score:5, Informative)
This is known as a URN [wikipedia.org]. URLs and URNs are together known as URIs.
Re:Centralization (Score:4, Informative)
(http://silmaril.ie/cgi-bin/blog)
The defects of the URN/URI/URL mechanism were well known at the time this was discussed in the working groups and SIGs while XML was gestating.
The correct solution would have been to fix the outstanding problems with FPIs and use a combination of local catalog and DNS-style resolution, but this was turned down. Perhaps it's time to wake it up.
In the 1990s I did try to devise a resolution server for FPIs, in the hope that someone like the (then) GCA (now IdeAlliance) -- who were the ISO 9070 Registration Authority and theoretically still are -- would pick up the idea.
I still have the large collection of SGML DTDs used at the time, now largely redundant, but replacing it with current XML is not the problem. This is something that should probably be discussed at the Markup conference in Montreal this summer.
Don't use them (Score:5, Insightful)
Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.
I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.
Re:Don't use them (Score:5, Informative)
w3c (Score:5, Insightful)
(http://jkcosta.info/)
Re:w3c (Score:5, Funny)
DTD? (Score:4, Insightful)
(http://freedomsforums.com/)
Re:DTD? (Score:5, Informative)
(http://www.theinternetisboring.net/)
Re:DTD? (Score:5, Funny)
In case of death... (Score:5, Insightful)
(http://slashdot.org/)
Not only DTDs, but also ontology definitions (Score:1, Insightful)
I would suggest someone like OSTG or the Mozilla foundation...
Sane? (Score:5, Insightful)
(http://phydeauxpets.com/)
No (Score:5, Insightful)
You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.
For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.
The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.
DTDs, XML entities and the non-breaking space (Score:4, Funny)
(http://www.theblackforge.net/)
It's trivial to define " " yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.
Don't know what a DTD is? (Score:3, Informative)
(http://www.ryansimpkins.com/ | Last Journal: Tuesday September 18 2001, @01:43AM)
http://en.wikipedia.org/wiki/Document_Type_Defini
Doctypes are completely broken design. (Score:2)
Perhaps something like "pool.ntp.org"? (Score:5, Insightful)
(http://www.zocalo.uk.com/)
NTP.org" [ntp.org] maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.
Just a thought...
using non-local cached copy considered harmful (Score:5, Interesting)
(http://nagafix.co.uk/)
Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).
Call me crazy... (Score:5, Interesting)
(http://slashdot.org/ | Last Journal: Tuesday August 29 2006, @06:44PM)
Seriously, what the fuck were they thinking relying on a server to be always available?
Re:Call me crazy... (Score:5, Funny)
(http://slashdot.org/ | Last Journal: Tuesday August 29 2006, @06:44PM)
Your trust in the world is cute.
URI vs URL (Score:5, Insightful)
(http://slashdot.org/ | Last Journal: Wednesday January 21 2004, @08:36PM)
"http://my.netscape.com/publish/formats/rss-0.91.
It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.
EXACTLY (Score:5, Insightful)
(http://slashdot.org/~wowbagger/journal/87552 | Last Journal: Monday September 03, @08:07PM)
A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.
The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.
If you are running a recent vintage Linux, look in
So:
XML Catalogs (Score:1)
(http://chris.chiasson.name/)
Uhhm.... I thought we were using XML Schema now??? (Score:2, Offtopic)
I need some chinese food. Hmm...
Schezuan!
Not again (Score:4, Informative)
(Last Journal: Friday August 31, @07:08PM)
Supply local DTDs with your app (Score:5, Interesting)
When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.
In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.
The DNS root servers are run by... (Score:2)
(http://drew.intercarve.net/)
I have a server in my basement we could use. (Score:5, Funny)
(http://fyoder.com/)
HTML 5 (Score:2)
(http://robertdot.org/ | Last Journal: Friday January 23 2004, @06:02PM)
DTDs are Useless (Score:1)
(http://lachy.id.au/)
Quick, someone register http://all.your.dtds.are.belong.to.us/ [belong.to.us] :-)
Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).
DTDs are bad for several reasons [hsivonen.iki.fi]:
Plus, if a UA needs to request the DTD every time it parses the file, that adds significant overhead by the time it fetches the DTD, parses it and checks the document for validity. It's just not worth it. The Netscape RSS DTD issue was a mistake, and it's time to learn from that. There are much better alternatives available for validating XML than DTDs, such as RelaxNG or Schematron.Isn't this addressed already? (Score:1)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitio
That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?
short answer: no (Score:4, Insightful)
(http://robotmonkeys.net/ | Last Journal: Tuesday October 26 2004, @03:23AM)
If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.
Finally, it's trivial to reconstruct a dtd from sample documents.
Builtin DTDs everywhere! (Score:1)
To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.
Did I forget to take in account one of those nice tiny little huge details somewhere?
URLs were never sane (Score:1)
(http://trimbo.blogspot.com/)
A URL has:
The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.
A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.
XML catalog files let your app use local copies... (Score:3, Informative)
DNS? (Score:1)
(http://saltmiser.blogspot.com/ | Last Journal: Tuesday May 08 2007, @11:44PM)
DTD Critical Hosting (Score:2, Insightful)
(http://www.flagworx.org/)
Some might think well what if it changes?
well its obvious download the new one update your xhtml/xml or application to the specific changes.
the best host is localhost (Score:1)
We use maven, use dtd's schemas wsdl etc. Much of the wsdl and other files refer to online areas. We download these and alter the references to be local. Otherwise we would have a build fail because of an internet issue, which is just nuts.
Same with maven, we have our own local repository where we keep a subset of what we use. Again same situation. In these cases this is just for building, I can't imagine doing this on a live site. This can especially go for externally referenced javascript... local copies are your friend.
Well... (Score:2)
(http://www.ancar.org/)
guess who will want control over it (Score:1)
Missing 4ml.org (Score:1)
We definitely need some sustainable way to host the DTDs.
~Sivaraj
The security implications are extremely ugly (Score:3, Insightful)
2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?
3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.
That's the problem; here's my solution. It's composed of two parts.
A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").
B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in
Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.
What organization? (Score:2)
(http://www.iphone.org/ | Last Journal: Friday September 07, @01:31PM)
The ITU [wikipedia.org] (also here [itu.int].
Go ahead, laugh, but I think it's long past time for control of such functions as DNS, NTP, assigned numbers, et cetera, to be transferred out of the hands of primarily US-based corporations and loosely coupled organizations such as the IETF and IANA and into the hands of some sort of international treaty organization.
Since the ITU not only fits this description, but in fact was founded to deal with precisely these sorts of issues, why not let it do what it does for the Internet as well?
Stupid question :) (Score:2)
(http://www.genesi-usa.com/)
Is it not obvious that it may as well be the W3C? XML is their standard, operating a registry for public-use DTDs would be a rather reasonable service to provide..
Bad idea any way you slice it (Score:1)
But maybe there are urls out there pointing to "the latest and greatest" version, rather than a specific version, and you like the idea of using "the latest and greatest". So, think for a moment what happens when the DTD/schema changes. Is your app magically going to change how it deals with the xml at the same time? Of course not!
So, until you can get out a patch, you'd be refusing xml docs your code/xsl has been built to handle, and possibly letting in xml docs that your code/xsl has not been built to handle. Whereas, if you just kept a local DTD/schema, you would have no trouble keeping it, and the code/xsl behind in in sync.
No... (Score:1)
They don't need to be hosted for the same reason that there isn't a machine out there called com.sun.java.util.dates.FunnyDate
Re:hmm... (Score:1, Redundant)
Re:XML? What? (Score:2)