Is Dedicated Hosting for Critical DTDs Necessary? 140
pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"
I know! (Score:5, Funny)
Mhahahahaha. Yeah. I know, I crack myself up.
Re: (Score:2)
Re: (Score:2)
Google or archive.org come to mind as a more senseible choice.
Re: (Score:3, Insightful)
The point was that repling on a single entity isn't a good idea. Google is a single company, The Internet Archive is a single organisation.
I'd suggest something more along the lines of DNS, where although there would be a single ultimate authority, the day-to-day business of serving DTDs would be distributed and handled by multiple levels of servers.
Re: (Score:2, Insightful)
Catalog files? (Score:1)
Re: (Score:2)
Is there some technical reason I'm not aware of that means it has to stay somewhere central?
Re:Catalog files? (Score:5, Insightful)
Is there some technical reason I'm not aware of that means it has to stay somewhere central?
There shouldn't be, yet I would be greatly surprised if some application didn't match on the entire DTD string, hostname and all.
I am equally baffled at what applications need the DTD for anyway. Except for generic XML applications, what use is a DTD? Most applications only handles a fixed few XML document types anyway.
Finally, if they really need that DTD... any distro have most major DTDs available. No reason why they couldn't carry a few extra. Should be easy to just search for them locally.
Re: (Score:2)
The DTD (or more recently schema) should define the allowable content and structure of the XML document for validation purposes. This is supposed to be one of the selling points of XML, being able to verify that a document is valid.
Sure, but no sane RSS reader is going to read the DTD, parse it and validate the served XML. Nor will a browser chew through the (humongous) XHTML DTD and then validate it against this DTD. Rather, such application will have the structure of the XHTML hardwired into the application, and ignore any unknown tags. The only check would be to check if the DTD is known, plus any structure tests that the application chooses to implement --- usually the free ones.
Authoring tools are, of course, another matter, b
Re:I know! (Score:5, Funny)
Mhahahahaha. Yeah. I know, I crack myself up.
Centralization (Score:5, Insightful)
Regards.
Re: (Score:3, Interesting)
Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.
Re:Centralization (Score:5, Informative)
This is known as a URN [wikipedia.org]. URLs and URNs are together known as URIs.
Re:Centralization (Score:4, Informative)
The defects of the URN/URI/URL mechanism were well known at the time this was discussed in the working groups and SIGs while XML was gestating.
The correct solution would have been to fix the outstanding problems with FPIs and use a combination of local catalog and DNS-style resolution, but this was turned down. Perhaps it's time to wake it up.
In the 1990s I did try to devise a resolution server for FPIs, in the hope that someone like the (then) GCA (now IdeAlliance) -- who were the ISO 9070 Registration Authority and theoretically still are -- would pick up the idea.
I still have the large collection of SGML DTDs used at the time, now largely redundant, but replacing it with current XML is not the problem. This is something that should probably be discussed at the Markup conference in Montreal this summer.
Re: (Score:2)
Re: (Score:2, Informative)
http://en.wikipedia.org/wiki/Magnet:_URI_scheme [wikipedia.org]
Re: (Score:2)
Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
Nope, it just means that your torrent tracker would have to have a way to resolve the reference. Whether something like DNS where you have specific "go-to" hosts, or whether you just ask every host you're connected with, or something else (maybe a kind of dynamic mesh with ad-hoc gateways), the choice is up to you.
Maybe something like NTP, where you have the strata-1 time servers, and then the designated strata-2 servers, and everyone is encouraged to set up a strata-3 server for their own subnet. This w
Don't use them (Score:5, Insightful)
Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.
I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.
Re:Don't use them (Score:5, Informative)
Localized hosting (Score:2)
Re: (Score:2)
1.) Use the DTD URI to determine a document's type, from a list of known URI/type associations in the application. (For instance, a web browser that checks the DTD to determine whether to render in HTML or XHTML mode.)
and
2.) Validate the document against the DTD from the copy stored at the URI (given that the URI is a URL... it does not necessarily have to be.)
And, if the DTD isn't at the URL (fails on 2), it barfs from not being a
Re: (Score:2)
Re: (Score:1)
Re: (Score:3, Interesting)
Centralization of more than DTDs is good. (Score:2)
Re: (Score:2)
Re: (Score:2)
Take a look at what servers get hacked. It isn't often those that are well maintained by trained people with years of experience. It's usually people stupid enough to run a server that hasn't been updated in three years.
w3c (Score:5, Insightful)
Re:w3c (Score:5, Funny)
Re: (Score:3, Interesting)
Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was des
Re: (Score:2)
Re: (Score:2)
I'd expand on that and say: whatever organization is responsible for developing the format that the DTD is for. The W3C is responsible for things like XHTML, so they should be hosting the DTD for it. The IETF should have the DTD for Atom. RSS is currently maintained by Harvard and the DTD should be maintained by them.
Re: (Score:2)
RSS 0.9x was developed by Netscape; having the originator host it, forever, is how we got in this problem in the first place.
sure there is! (Score:3, Funny)
DTD? (Score:4, Insightful)
Re:DTD? (Score:5, Informative)
Re:DTD? (Score:5, Funny)
Re: (Score:2)
Re: (Score:1)
Re: (Score:2)
"DTDTDT, that's right, Buck."
Chris Mattern
Re: (Score:2)
"BDBDBDBD"
In case of death... (Score:5, Insightful)
Re: (Score:2, Interesting)
what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?
I'm sure there are other scenarios as well.
Not only DTDs, but also ontology definitions (Score:1, Insightful)
I would suggest someone like OSTG or the Mozilla foundation...
Re: (Score:2)
Why?
Sane? (Score:5, Insightful)
Re: (Score:2)
Re: (Score:2, Insightful)
Re: (Score:3, Insightful)
Public hosting of schema documents should not be for application use where the application knows ahead of time what kind of document it will be parsing (like the RSS situation). In all li
No (Score:5, Insightful)
You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.
For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.
The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.
Mod parent up (Score:2)
Re: (Score:3, Insightful)
Not sufficiently robust is an understatement. ****ing stupid is what I would call it. If every browser had to hit the W3C site for the HTML DTDs every time they loaded a web page, the web would collapse.
Re: (Score:1)
Okay, so what you are saying is that we ship the SAME DTD that is already defined with the application that we provide. ???? WHAT ????
This is does not follow OO design methodologies! REUSE!!!! The whole point behind OO design is that we reuse existing components. If we can not do this then what is the point of OO. If we have defined a DTD that can be used BY the community, then it should be made available FOR the community. The re-distribution of the DTD does not make sense, as it could be altered f
Re: (Score:2)
DTDs, XML entities and the non-breaking space (Score:4, Funny)
It's trivial to define " " yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.
Re: (Score:1)
A few problems with RelaxNG validation (Score:2)
Thus, while an application which expects files conforming to a specific schema can validate against that schema, it is not possible for a program to validate an arbitrary XML file. For example, there is no way xmllint can automatically find the related RelaxNG schema, in the same way that it can find the DTD.
If I am wrong, and there is a way to
Don't know what a DTD is? (Score:3, Informative)
http://en.wikipedia.org/wiki/Document_Type_Defini
Doctypes are completely broken design. (Score:2)
Perhaps something like "pool.ntp.org"? (Score:5, Insightful)
NTP.org" [ntp.org] maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.
Just a thought...
using non-local cached copy considered harmful (Score:5, Interesting)
Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).
Call me crazy... (Score:5, Interesting)
Seriously, what the fuck were they thinking relying on a server to be always available?
Re: (Score:1)
Seriously, what the fuck were they thinking relying on a server to be always available?
I've noticed the trend lately. Folks *want* some server to always be available. They want this so badly, they just go about their business as if the server in question would always be available. Even trained pros, who know better, sometimes think and/or act this way. Especially with regards to systems they can't see, and do not have to maintain. Thus, the Hard & Painful Lessons of Life(tm) still have their place in the world. ;(
Re: (Score:3, Interesting)
Re:Call me crazy... (Score:5, Funny)
Your trust in the world is cute.
URI vs URL (Score:5, Insightful)
"http://my.netscape.com/publish/formats/rss-0.91.
It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.
MOD PARENT UP (Score:2)
EXACTLY (Score:5, Insightful)
A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.
The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.
If you are running a recent vintage Linux, look in
So:
Re: (Score:2)
If instead the URL just returned a page that said: "You can find a copy of the appropriate DTD at the following locations..." and listed them, it would remove the temptation to introduce a programmatic dependency on that URL being live but still give people a way to find that resource, and force
Re: (Score:2)
This reminds me of something. . .
Voice-over: Here at Luton it's a three-cornered fight between Alan Jones - Sensible Party; in the middle, Tarquin Fintimlimbimwhinbimlim Bus- stop F'tang F'tang Olè Biscuitbarrel, Silly Party, and Kevin Phillips-Bong, the Slightly Silly candidate.
XML Catalogs (Score:1)
Re: (Score:1)
HTML clients (browsers) don't go requesting the HTML dtd, and so it could be said that the RSS client shouldn't either. For RSS clients though they're more pure in that they take the DTDs definition of entities literally so we do need to access the DTD.
But you'd expect clients to cache them, using XML catalogs as you say. They should be packaged with the standard DTDs, a default DT
Uhhm.... I thought we were using XML Schema now??? (Score:2, Offtopic)
I need some chinese food. Hmm...
Schezuan!
Not again (Score:4, Informative)
Supply local DTDs with your app (Score:5, Interesting)
When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.
In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.
The DNS root servers are run by... (Score:2)
I have a server in my basement we could use. (Score:5, Funny)
Re: (Score:3, Funny)
I have an old Sun Sparc 5/70 that still works. Rock solid machine and has OpenBSD loaded on it. I even have a static IP address on my dialup service I could put it on.
HTML 5 (Score:2)
DTDs are Useless (Score:1)
Quick, someone register http://all.your.dtds.are.belong.to.us/ [belong.to.us] :-)
Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).
DTDs are bad for several reas [hsivonen.iki.fi]
Re: (Score:2)
That would involve them not being idiots. Not going to happen.
TWW
Isn't this addressed already? (Score:1)
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitio
That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?
Re: (Score:2)
The URI there (http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitio
In other words, it's just an identifier--using a URL was just a nice easy way of making sure it was unique.
short answer: no (Score:4, Insightful)
If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.
Finally, it's trivial to reconstruct a dtd from sample documents.
Re: (Score:2)
But it won't be the same DTD as the one used to create the documents, which is probably the 'standard' one.
Re: (Score:2)
The purpose of a DTD is to define what the legal/agreed format of the data is, Not what format the data is actually in.
You can't deduce the DTD from the XML itself as the XML may be illegally/incorrectly formatted or otherwise corrupted.
Also the XML file you create the DTD from my not happen to use all the legal variations of allowed formats, therefore you won't get a complete DTD anyway.
Re: (Score:2)
Huh?
A DTD is a computer readable formal spec of a document structure. Why it needs to be computer readable is beyond me. All it does is allow for pendanic software, which really isn't desirable quality, except in a lint tool.
Practically speaking, the easiest way to spec an XML document is simply to construct an example document using every bell and whistle available in the system.
You can'
Re: (Score:3, Interesting)
I did work developing a large XML-based integration with the mortgage lender AmeriQuest. Boy, did they have interesting ideas on what valid XML is! I had to deal with fun things like:
/>data</tag> - An empty tag being used for an opening tag
<tag
</tag>data</tag> - A closing tag being used for an opening tag
<tag>data<tag> - The opposite proble
Builtin DTDs everywhere! (Score:1)
To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.
Did I forget to take in account o
URLs were never sane (Score:1)
A URL has:
The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.
A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.
XML catalog files let your app use local copies... (Score:3, Informative)
DTD Critical Hosting (Score:2, Insightful)
Some might think well what if it changes?
well its obvious download the new one update your xhtml/xml or application to the specific changes.
Well... (Score:2)
The security implications are extremely ugly (Score:3, Insightful)
2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?
3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.
That's the problem; here's my solution. It's composed of two parts.
A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").
B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in
Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.
What organization? (Score:2)
The ITU [wikipedia.org] (also here [itu.int].
Go ahead, laugh, but I think it's long past time for control of such functions as DNS, NTP, assigned numbers, et cetera, to be transferred out of the hands of primarily US-based corporations and loosely coupled organizations such as the IETF and IANA and into the hands of som
Stupid question :) (Score:2)
Is it not obvious that it may as well be the W3C? XML is their standard, operating a registry for public-use DTDs would be a rather reasonable service to provide..
Re: (Score:1, Redundant)
Re: (Score:1)
ICANN song. (Score:1)
Anything you can do, ICANN do better./ICANN do anything Better than you.
No, you can't./Yes, ICANN. No, you can't./Yes, ICANN. No, you can't./Yes, ICANN, Yes, ICANN!
Anything you can be ICANN be greater./Sooner or later, I'm greater than you.
No, you're not. Yes, I am./No, you're not. Yes, I am./No, you're NOT!. Yes, I am./Yes, I am!
ICANN shoot a partridge With a single cartridge./ICANN get a sparrow With a bow and arrow.
ICANN live on bread and cheese.
And only on that?/Yes./So can a rat!
Any note you can reach ICANN go higher.
ICANN sing anything Higher than you.
No, you can't. (High)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN. (Higher) No, you can't. (Higher)
Yes, ICANN! (Highest)
Anything you can buy ICANN buy cheaper./ICANN buy anything Cheaper than you.
Fifty cents?/Forty cents! Thirty cents?/Twenty cents! No, you can't!
Yes, ICANN, Yes, ICANN!
Anything you can say ICANN say softer./ICANN say anything Softer than you.
No, you can't. (Softly)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer) No, you can't. (Softer)
Yes, ICANN. (Softer)
YES, ICANN! (Full volume)
ICANN drink my liquor Faster than a flicker./ICANN drink it quicker And get even sicker!
ICANN open any safe.
Without bein' caught?/Sure./That's what I thought--you crook!
Any note you can hold ICANN hold longer.ICANN hold any note Longer than you.
No, you can't.
Yes, ICANN No, you can't/Yes, ICANN No, you can't.
Yes, ICANN
Yes, I-I-I-I-I-I-I-I-I No, you C-A-A-A-A-A-A-A-A-A-A-A-A-N'T--
CA-A-A-A-N! (Cough, cough!)
Yes, you ca-a-a-an!
Anything you can wear ICANN wear better./In what you wear I'd look better than you.
In my coat?/In your vest! In my shoes?/In your hat! No, you can't!/Yes, ICANN Yes, ICANN!
Anything you say ICANN say faster./ICANN say anything Faster than you.
No, you can't. (Fast)
Yes, ICANN. (Faster) No, you can't. (Faster)
Yes, ICANN. (Faster) Noyoucan't. (Faster)
YesIcan! (Fastest)
ICANN jump a hurdle./ICANN wear a girdle.
ICANN knit a sweater./ICANN fill it better!
ICANN do most anything!/Can you bake a pie?
No./Neither can I.
Anything you can sing ICANN sing sweeter./ICANN sing anything Sweeter than you.
No, you can't. (Sweetly)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't. (Sweeter)
Yes, ICANN. (Sweeter) No, you can't, can't, can't (sweeter)
Yes, ICANN, CAN, CAN (Sugary)
Yes, ICANN! No, you can't!
Re: (Score:2)