Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
The Internet

Is Dedicated Hosting for Critical DTDs Necessary? 140

pcause asks: "Recently there was a glitch, when someone at Netscape took down a page that had an important DTD (for RSS), used by many applications and services. This got me thinking that many or all of the important DTDs that software and commerce depend on are hosted at various commercial entities. Is this a sane way to build an XML based Internet infrastructure? Companies come and go all of the time; this means that the storage and availability of those DTDs is in constant jeopardy. It strikes me that we need an infrastructure akin to the root server structure to hold the key DTDs that are used throughout the industry. What organization would be the likely custodian of such data, and what would be the best way to insure such an infrastructure stays funded?"
This discussion has been archived. No new comments can be posted.

Is Dedicated Hosting for Critical DTDs Necessary?

Comments Filter:
  • I know! (Score:5, Funny)

    by Colin Smith ( 2679 ) on Thursday May 17, 2007 @06:09PM (#19170719)
    ICANN!

    Mhahahahaha. Yeah. I know, I crack myself up.

     
    • Clearly the particular organization is not yet formed, however there is absolutely no question that it should be hosted in Iran.
    • by rs79 ( 71822 )
      Why there? Did you want to run an MLM?

      Google or archive.org come to mind as a more senseible choice.
      • Re: (Score:3, Insightful)

        by mollymoo ( 202721 )

        The point was that repling on a single entity isn't a good idea. Google is a single company, The Internet Archive is a single organisation.

        I'd suggest something more along the lines of DNS, where although there would be a single ultimate authority, the day-to-day business of serving DTDs would be distributed and handled by multiple levels of servers.

        • Re: (Score:2, Insightful)

          by UltraAyla ( 828879 )
          I think you're right on the money here. DNS-like was my first thought as well. Have a root system where all updates are made, then have organizations which check for updates to a package of multiple critical DTDs on a weekly or monthly basis or something. Then people can have a list of DTD sources in the event that one goes down (though I'm pretty sure XML only supports one DTD in each document - someone correct me if I'm wrong). This would reduce the burden on any one person, allow organizations to manage
    • Am I missing something here, or is this problem solved by catalog files? Surely any decent XML parser that can download an external DTD subset from a URI can get the DTD subset via a catalog file?
      • by gmack ( 197796 )
        Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

        Is there some technical reason I'm not aware of that means it has to stay somewhere central?
        • Re:Catalog files? (Score:5, Insightful)

          by EsbenMoseHansen ( 731150 ) on Friday May 18, 2007 @01:57AM (#19175277) Homepage

          Or better yet why can't you just copy the blasted thing to your own site if your going to use it?

          Is there some technical reason I'm not aware of that means it has to stay somewhere central?

          There shouldn't be, yet I would be greatly surprised if some application didn't match on the entire DTD string, hostname and all.

          I am equally baffled at what applications need the DTD for anyway. Except for generic XML applications, what use is a DTD? Most applications only handles a fixed few XML document types anyway.

          Finally, if they really need that DTD... any distro have most major DTDs available. No reason why they couldn't carry a few extra. Should be easy to just search for them locally.

    • Re:I know! (Score:5, Funny)

      by commodoresloat ( 172735 ) * on Thursday May 17, 2007 @07:45PM (#19172231)

      ICANN!

      Mhahahahaha. Yeah. I know, I crack myself up.

      No you cann't!
  • Centralization (Score:5, Insightful)

    by ushering05401 ( 1086795 ) on Thursday May 17, 2007 @06:09PM (#19170725) Journal
    Nothing too insightful to write, but worth saying in today's volatile political climate. Centralization makes me nervous.

    Regards.
    • Re: (Score:3, Interesting)

      by radarsat1 ( 786772 )
      Exactly. How about hosting these important files via a decentralized bittorrent tracker?
      Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.
      There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.
      • Re:Centralization (Score:5, Informative)

        by Bogtha ( 906264 ) on Thursday May 17, 2007 @06:27PM (#19171069)

        There needs to be a way to refer to decentralized internet resources in a unique fashion. We need the equivalent of the URL for a file that is hosted simultaneously in many places.

        This is known as a URN [wikipedia.org]. URLs and URNs are together known as URIs.

        • Re:Centralization (Score:4, Informative)

          by frisket ( 149522 ) <peter@sil[ ]il.ie ['mar' in gap]> on Friday May 18, 2007 @07:45AM (#19176873) Homepage

          The defects of the URN/URI/URL mechanism were well known at the time this was discussed in the working groups and SIGs while XML was gestating.

          The correct solution would have been to fix the outstanding problems with FPIs and use a combination of local catalog and DNS-style resolution, but this was turned down. Perhaps it's time to wake it up.

          In the 1990s I did try to devise a resolution server for FPIs, in the hope that someone like the (then) GCA (now IdeAlliance) -- who were the ISO 9070 Registration Authority and theoretically still are -- would pick up the idea.

          I still have the large collection of SGML DTDs used at the time, now largely redundant, but replacing it with current XML is not the problem. This is something that should probably be discussed at the Markup conference in Montreal this summer.

      • Doctypes do not contain a URL indicating the location of the DTD, they include a URI. This URI is typically a URL, but could easily be something else.
      • Re: (Score:2, Informative)

        by kwark ( 512736 )
        You meant something like magnet URIs?
        http://en.wikipedia.org/wiki/Magnet:_URI_scheme [wikipedia.org]
      • Of course, that would eliminate the use of a UNIVERSAL RESOURCE LOCATION, since it would no longer be centralized.

        Nope, it just means that your torrent tracker would have to have a way to resolve the reference. Whether something like DNS where you have specific "go-to" hosts, or whether you just ask every host you're connected with, or something else (maybe a kind of dynamic mesh with ad-hoc gateways), the choice is up to you.

        Maybe something like NTP, where you have the strata-1 time servers, and then the designated strata-2 servers, and everyone is encouraged to set up a strata-3 server for their own subnet. This w

    • Don't use them (Score:5, Insightful)

      by Anonymous Coward on Thursday May 17, 2007 @06:35PM (#19171191)
      If the absence of these files will break your app or service, then you need to make your app or service more robust.

      Sure, DTD files are necessary for development. If your app requires that they be used to validate something in real time each time it is comes in from a client or whatever, then use an internal copy of the version of the DTD file that you support. If the host makes a change to it (or drops it, or lets it get hacked), your app won't break, and you can decide when you will implement and support that change.

      I really don't see what is gained by making the real time operation of your application dependent on the availability and pristinity of remotely and independently hosted files. It just makes you fragile, and you can get all the benefits you need from just checking the files during your maintenance and development cycles.

      • Re:Don't use them (Score:5, Informative)

        by Skreems ( 598317 ) on Thursday May 17, 2007 @09:06PM (#19173061) Homepage
        Exactly. The only point of having a URL associated with a DTD is to assure a unique identifier for each one. It wasn't worth starting a group specifically to regulate DTD identifiers, so they hooked it to a system that's already regulated. Yeah, it's nice to have the DTD live at that location, so if you get a file with a reference to an unfamiliar DTD you can pull it down on the spot, but it shouldn't be required.
    • what to stop someone from hosting this files locally, for their own use, on a local server? In some cases this would not be practical, with redirects for downloading, etc. but could this be done for some instances?
      • by FLEB ( 312391 )
        I think (might be wrong) that most of the problems come from some apps which:

        1.) Use the DTD URI to determine a document's type, from a list of known URI/type associations in the application. (For instance, a web browser that checks the DTD to determine whether to render in HTML or XHTML mode.)

        and

        2.) Validate the document against the DTD from the copy stored at the URI (given that the URI is a URL... it does not necessarily have to be.)

        And, if the DTD isn't at the URL (fails on 2), it barfs from not being a
      • Indeed, I've always considered this a must for production applications - particulary intranet applications The overhead of retreiving the DTD from the web is simply unacceptable in many situations.
      • Re: (Score:3, Interesting)

        by bytesex ( 112972 )
        Exactly. What always struck me about certain applications that do a DTD-conformant XLST processing step _every_time_ a web page is checked. That means my web app is dependent on the location on the internet being reachable (proxies!! downtime!! all that yummy goodness!!), plus the unacceptable overhead. But.. they merrily keep on making XSLT processors that _will_not_run_ without access to the DTD (I'm looking at you java!).
    • The trick is to make centralized copies of important, or oft used, files available. I'd not just do DTD's. I think as AJAX, Web 2.0, or whatever you wanna call it, grows more popular and demands users download more and more Javascript, images, etc that are often the same files between different websites that it could be very useful to them if we stored a copy of those shared files on one server, with caching properly configured, so that users need to only download and store one copy instead of dozens of cop
      • by Pieroxy ( 222434 )
        That's very good! So that hackers need to hack this one server and modify a tiny javascript file to screw up thousands of websites... It looks as if security was overlooked in your reasoning. Damn hackers !
        • by MikeFM ( 12491 )
          Security on one server, ran by someone that knows what they're doing, will be better than what we have now, which is thousands of webservers ran by people who have almost no idea what they are doing and no time and money to dedicate to doing it right.

          Take a look at what servers get hacked. It isn't often those that are well maintained by trained people with years of experience. It's usually people stupid enough to run a server that hasn't been updated in three years.
  • w3c (Score:5, Insightful)

    by partenon ( 749418 ) * on Thursday May 17, 2007 @06:09PM (#19170729) Homepage
    w3c.org [w3c.org] . There's no better place to keep the standards related to the web.
    • Re:w3c (Score:5, Funny)

      by JordanL ( 886154 ) <jordan,ledoux&gmail,com> on Thursday May 17, 2007 @06:16PM (#19170853) Homepage

      There's no better place to keep the standards related to the web.
      Some say that wistfully, others begrudgingly.
    • Re: (Score:3, Interesting)

      by inKubus ( 199753 )
      What about a distributed file system that works like DNS? Hierarchial servers that each are responsible for a different level of the DTD. The "Root" is a trusted group of servers, which maintain a list of other servers where you can get a copy of the rest of the DTD. Then plugin builders and other sub-entities can have their own server for extensions to the base DTD.

      Unfortunately, the DNS method has proven to not necessarily be the best way, with poisoning and stuff that can occur. Of course, it was des
      • The TXT record is more than capable of doing this, just like your spf statement for your approved mail exchangers.
    • by flooey ( 695860 )
      w3c.org . There's no better place to keep the standards related to the web.

      I'd expand on that and say: whatever organization is responsible for developing the format that the DTD is for. The W3C is responsible for things like XHTML, so they should be hosting the DTD for it. The IETF should have the DTD for Atom. RSS is currently maintained by Harvard and the DTD should be maintained by them.
      • by J'raxis ( 248192 )

        RSS 0.9x was developed by Netscape; having the originator host it, forever, is how we got in this problem in the first place.

    • What's wrong with this website [microsoft.com]?
  • DTD? (Score:4, Insightful)

    by mastershake_phd ( 1050150 ) on Thursday May 17, 2007 @06:10PM (#19170755) Homepage
    and DTD stands for? Distributed Technical Dependency?
  • by Kjella ( 173770 ) on Thursday May 17, 2007 @06:11PM (#19170785) Homepage
    ...keep a copy, host it on your own site and reference that instead. There was no problem except that some were using that file to download the definitions. Or just expand the definition to include a checksum and a list of mirrors. Is this even a problem worth solving? I mean except for the slashdot post it seemed to me like this went by without anyone noticing.
    • Re: (Score:2, Interesting)

      by centinall ( 868713 )
      what if you're using a 3rd party library that has references to the dtd, schema or whatever? you don't really want to go through and change all of them.

      what if XML files, for instance, are being exchanged between your application and others and they are including a dtd that doesn't reside within your domain?

      I'm sure there are other scenarios as well.
  • by Anonymous Coward
    Such a system should also allow stable storage and management of ontology definitions, used within the semantic web.

    I would suggest someone like OSTG or the Mozilla foundation...
  • Sane? (Score:5, Insightful)

    by DogDude ( 805747 ) on Thursday May 17, 2007 @06:25PM (#19171009)
    Well, I wouldn't call it sane if anybody who is actively using XML and needs a DTD isn't hosting it right along with whatever web site they're using the XML for. Relying on somebody else to maintain a critical DTD that you use isn't sane. It's pretty dumb.
    • by sconeu ( 64226 )
      Who says you're using XML for a website?
      • Re: (Score:2, Insightful)

        by DogDude ( 805747 )
        Well, even if you're not, then you should absolutely, positively, and without any doubt, at least in my mind, have a copy of all of your DTD's.
    • Re: (Score:3, Insightful)

      by curunir ( 98273 ) *
      Exactly. If you write an application that requires a DTD (or XSD for that matter) to parse an XML document, include that file as part of the software. The XML processing code should intercept entity references and load them from the local copy. Not only does this make your application more reliable, it also makes it faster.

      Public hosting of schema documents should not be for application use where the application knows ahead of time what kind of document it will be parsing (like the RSS situation). In all li
  • No (Score:5, Insightful)

    by Bogtha ( 906264 ) on Thursday May 17, 2007 @06:25PM (#19171023)

    You shouldn't be using DTDs any more. Validation is better achieved with RelaxNG, and you shouldn't use them for entity references because then non-validating parsers won't be able to handle your code.

    For those document types that already use DTDs, either you ship the DTDs with your application, or you cache them the first time you parse a document of that type.

    The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust. You shouldn't be looking at the hosting to solve the problem.

    • This is just not an issue worth solving...
    • Re: (Score:3, Insightful)

      by Anonymous Coward
      The Netscape DTD issue was caused not by the DTD being unavailable, but by some client applications not being sufficiently robust.

      Not sufficiently robust is an understatement. ****ing stupid is what I would call it. If every browser had to hit the W3C site for the HTML DTDs every time they loaded a web page, the web would collapse.

    • Okay, so what you are saying is that we ship the SAME DTD that is already defined with the application that we provide. ???? WHAT ????

      This is does not follow OO design methodologies! REUSE!!!! The whole point behind OO design is that we reuse existing components. If we can not do this then what is the point of OO. If we have defined a DTD that can be used BY the community, then it should be made available FOR the community. The re-distribution of the DTD does not make sense, as it could be altered f

      • From what you write, it is clear that this is among the least of your problems... Anyways: please do not shout as much!
    • by Darkforge ( 28199 ) on Thursday May 17, 2007 @07:56PM (#19172339) Homepage
      Unfortunately, DTDs aren't just for validation... they're also the only good way to define "entities" (e.g. "&foo;") in XML. This comes up a lot when trying to put HTML in XML feeds, because HTML has a lot of entities that aren't in the XML spec. Specifically, you may notice that you can't type "&nbsp;" in ordinary XML.

      It's trivial to define "&nbsp;" yourself in a DTD, (<!ENTITY nbsp "&#a0;">) and many of the standard DTDs out there do define it, but by the XML 1.0 standard it's got to be defined somewhere or else the XML won't parse.
      • You're better off using numeric or hexadecimal character references instead, or just encoding the file in UTF-8 and using whatever character you need directly. Although, it would have really helped if XML 1.0 had predefined the entire set of entity references defined HTML4, instead of amp, lt, gt, quot and apos. Then they all could have been used without a DTD.
    • The last time I checked, there is no mechanism by which an XML file can provide a link to the corresponding RelaxNG schema in the same way that it can provide a DTD.

      Thus, while an application which expects files conforming to a specific schema can validate against that schema, it is not possible for a program to validate an arbitrary XML file. For example, there is no way xmllint can automatically find the related RelaxNG schema, in the same way that it can find the DTD.

      If I am wrong, and there is a way to
  • by ryanisflyboy ( 202507 ) on Thursday May 17, 2007 @06:34PM (#19171167) Homepage Journal
  • The only other language I know of that even allows file sourcing over HTTP is PHP, and there it's a gaping security hole that defaults to off. In everything else, the dependencies *get installed to the local file system*.
  • by Zocalo ( 252965 ) on Thursday May 17, 2007 @06:39PM (#19171279) Homepage

    NTP.org" [ntp.org] maintains a pool of public NTP servers that are accessible via the hostname "pool.ntp.org", so perhaps something similar would work for a global DTD repository. An industry organization with a vested interest, the W3C seems like the most logical, could maintain the DNS zone and organizations could volunteer some server space and bandwidth to host a mirror of the collected pool of DTDs. Volunteering organizations might come and go, but when that happens it's just a matter of updating the DNS zone to reflect the change and everyone using DTDs just needs to know a single generic hostname will always provide a copy of the required DTD.

    Just a thought...

  • by tota ( 139982 ) on Thursday May 17, 2007 @06:41PM (#19171305) Homepage
    Most tools provide a way to refer to a DTD on a public URL, yet use the local copy instead. (ie: taglib-location directive in java)

    Doing anything else strikes me as fundamentally dangerous and insecure: it makes a remote dns vulnerability into an easy application DoS (or worse).

  • Call me crazy... (Score:5, Interesting)

    by Nimey ( 114278 ) on Thursday May 17, 2007 @06:41PM (#19171311) Homepage Journal
    but just have your DTD as a W3C standard, distribute copies with your software, and don't bother a remote server until a new version of the DTD is released. Then distribute it with a new version of your software.

    Seriously, what the fuck were they thinking relying on a server to be always available?
    • Seriously, what the fuck were they thinking relying on a server to be always available?

      I've noticed the trend lately. Folks *want* some server to always be available. They want this so badly, they just go about their business as if the server in question would always be available. Even trained pros, who know better, sometimes think and/or act this way. Especially with regards to systems they can't see, and do not have to maintain. Thus, the Hard & Painful Lessons of Life(tm) still have their place in the world. ;(

    • Re: (Score:3, Interesting)

      by Megane ( 129182 )
      Even more stupid is that the URI had a freaking version number in the filename! It's not like someone would update it, and then give it the old version number. It's going to give you the same file even when there's a newer version!
  • URI vs URL (Score:5, Insightful)

    by Sparr0 ( 451780 ) <sparr0@gmail.com> on Thursday May 17, 2007 @06:41PM (#19171315) Homepage Journal
    A key mistake in your assumptions was brought up when the Netscape fiasco was news, and I will bring it up again...

    "http://my.netscape.com/publish/formats/rss-0.91.d td" is a URI. It uniquely identifies a file. It *HAPPENS* to also be the URL for that same file, for now, but that is just a fortunate intentional coincidence. Your software should not rely on or require the file to be located at that URL. /var/dtd/rss-0.91.dtd is a perfectly valid location for the file identified by the URI "[whatever]/rss-0.91.dtd". What we need is for XML-using-software authors to support and embrace local DTD caches, AND package DTDs along with their applications (with the possibility of updating them from the web if neccessary).

    It is silly that millions of RSS readers fetch a non-changing file from the same web site every day. It is only very slightly less silly that they fetch it from the web at all.
    • Don't usually do this, but the above comment is the first one in this conversation that explains why this problem doesn't really exist.
    • EXACTLY (Score:5, Insightful)

      by wowbagger ( 69688 ) on Thursday May 17, 2007 @08:03PM (#19172435) Homepage Journal
      Exactly right, but it is even worse than that:

      A DTD spec SHOULD have both a PUBLIC identifier and a SYSTEM identifier. The system identifier is strongly recommended to be a URL so that a validating parser can fetch the DTD if the DTD is not found in the system catalog.

      The system catalog is supposed to map from the PUBLIC identifier to a local file, so that the parser needn't go to the network.

      If you are running a recent vintage Linux, look in /etc/xml/ - there are all the catalog maps for all the various DTDs in use.

      So:
      1. The application writers SHOULD have added the DTDs to the local system's catalog.
      2. Failing that, the application SHOULD have cached the DTD locally the first time it was fetched, and never fetched it again.


    • Actually, I'd go a step further. It might be useful to actually *not* host the DTD itself at that URI. As I recall, there was never a requirement that DTDs actually be located at the URI if it was treated as a URL.

      If instead the URL just returned a page that said: "You can find a copy of the appropriate DTD at the following locations..." and listed them, it would remove the temptation to introduce a programmatic dependency on that URL being live but still give people a way to find that resource, and force
    • by treeves ( 963993 )
      very slightly less silly

      This reminds me of something. . .

      Voice-over: Here at Luton it's a three-cornered fight between Alan Jones - Sensible Party; in the middle, Tarquin Fintimlimbimwhinbimlim Bus- stop F'tang F'tang Olè Biscuitbarrel, Silly Party, and Kevin Phillips-Bong, the Slightly Silly candidate.

  • I think there is an OASIS standard called XML Catalogs for redirecting offsite schema requests to a local copy...
    • by holloway ( 46404 )
      Yes, you're right, that's the standard way of caching them locally. I'm not sure that all RSS clients are XML processors though.

      HTML clients (browsers) don't go requesting the HTML dtd, and so it could be said that the RSS client shouldn't either. For RSS clients though they're more pure in that they take the DTDs definition of entities literally so we do need to access the DTD.

      But you'd expect clients to cache them, using XML catalogs as you say. They should be packaged with the standard DTDs, a default DT
  • People are still using DTD's? I thought everybody switched to XML Schema a while back. God, I can't keep up with this constant flux!

    I need some chinese food. Hmm...

    Schezuan!

  • Not again (Score:4, Informative)

    by dedazo ( 737510 ) on Thursday May 17, 2007 @07:14PM (#19171795) Journal
    This has been covered before here and elsewhere... anyone who is using a DTD as a URL rather than a URI needs to be taken out and shot. I say bring them all down and let all the apps that rely on them die or be fixed.
  • by Dragonshed ( 206590 ) on Thursday May 17, 2007 @07:23PM (#19171933)
    I recently (within the last year) deployed an application that end users use for downloading and viewing custom content, and are intended to install the app onto laptops, tablets, and other portable devices allowing them view said content both on and off-line.

    When prototyping our "offline mode", we ran into this exact same problem because the Xml APIs we used wanted to validate xml against online dtds. We ammended the validator's resolver to use locally embedded or cached dtds for all our doctypes, problem solved.

    In in my app it was an obvious problem to solve because offline usage was a big scenario, but I could imagine that being "out of scope" for a less-than-robust website.
  • Wikipedia's Root nameserver [wikipedia.org] entry says that 4 of the 13 root nameservers are run by private companies.
  • by fyoder ( 857358 ) on Thursday May 17, 2007 @07:53PM (#19172307) Homepage Journal
    Linux box with an uptime of 153 days. It does have to go down now and again so I can clean the dust and cat fur out of it, but that doesn't take too long.
    • Re: (Score:3, Funny)

      by Skapare ( 16644 )

      I have an old Sun Sparc 5/70 that still works. Rock solid machine and has OpenBSD loaded on it. I even have a static IP address on my dialup service I could put it on.

  • I don't know why important DTDs aren't just turned into serializations. HTML 5 (and, in practice, HTML in general) has a text/html serialization because the major browsers don't care about DTDs. It seems like well-published specifications like RSS should just be serialized and DTDs ignored, even though they are presented, instead of breaking when the DTD can't be found. I guess that wouldn't work if a generic XML parser was used for RSS, but for RSS readers, the DTD shouldn't matter.
  • Quick, someone register http://all.your.dtds.are.belong.to.us/ [belong.to.us] :-)

    Seriously though, we don't need dedicated hosting for DTDs. We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless. Web browser vendors realised this a long time ago. No browser ever read HTML's SGML DTDs, and they do not use validating parsers for XHTML either (although, they use a hack to parse a subset of the DTD to handle XHTML and MathML entity references).

    DTDs are bad for several reas [hsivonen.iki.fi]

    • by nagora ( 177841 )
      We need XML language spec writers, authors and user agent vendors to realise that DTDs are useless

      That would involve them not being idiots. Not going to happen.

      TWW

  • Isn't this what doctypes like this are for:

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transition al.dtd">

    That whole PUBLIC thing means that the browser can have its own copy so that it doesn't have to fetch it off the website. Is there a reason that this is not the standard way of doing this?

    • by nevali ( 942731 )
      No.

      The URI there (http://www.w3.org/TR/xhtml1/DTD/xhtml1-transition al.dtd) doesn't have to be a URL. It could be a URN, or some other kind of URI.

      In other words, it's just an identifier--using a URL was just a nice easy way of making sure it was unique.
  • short answer: no (Score:4, Insightful)

    by coaxial ( 28297 ) on Thursday May 17, 2007 @08:03PM (#19172423) Homepage
    Validation is overrated. Especially, when it comes to RSS. There's so many competing "compatable" standards, that really aren't. feedparser.org [feedparser.org] has a great write up about the state of RSS. It's pathetic.

    If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway. When you're constructing XML, you should write it according to the DTD, but if you're relying on a remote site, then you're asking for trouble. Just cache the version locally, but seriously, you're tool shouldn't really need it. You're engineers do, but not the tool.

    Finally, it's trivial to reconstruct a dtd from sample documents.
    • by JustNiz ( 692889 )
      >> Finally, it's trivial to reconstruct a dtd from sample documents.

      But it won't be the same DTD as the one used to create the documents, which is probably the 'standard' one.
    • Re: (Score:3, Interesting)

      by KermodeBear ( 738243 )
      Off-topic gripe, but:

      If you're reading a doc, don't bother validating it. You're probably going to have handle "invalid" XML anyway.

      I did work developing a large XML-based integration with the mortgage lender AmeriQuest. Boy, did they have interesting ideas on what valid XML is! I had to deal with fun things like:

      <tag />data</tag> - An empty tag being used for an opening tag
      </tag>data</tag> - A closing tag being used for an opening tag
      <tag>data<tag> - The opposite proble

  • Now I may have not quite grasped the importance of DTDs, but I can think of only one scenario where retrieving a DTD from a to-be-determined location would be useful: Validating XML against any DTD. (Solution: Whomever wants to validate will also provide the DTD.)
    To my knowledge any other application could just depend on builtin DTDs for validating the formats it knows and don't care about whatever it doesn't know as it wouldn't be able to intelligently use them, anyways.

    Did I forget to take in account o
  • Think about it.

    A URL has:

    • A hostname
    • A PORT number
    • A path on that machine

    The only one of those that the machine itself has any control over hiding from the user is the path, which can be virtualized. However, many aren't. DTDs certainly don't seem to be.

    A distributed system for this kind of mission-critical information is what we need. Think DNS for documents, rather than just hosts.
  • by KarmaRundi ( 880281 ) on Thursday May 17, 2007 @09:06PM (#19173057)
    You can map public and system identifiers to local resources. Use them for dtds, schemas, stylesheets, etc. Here's the spec [oasis-open.org]. Google for more information.
  • by liothen ( 866548 )
    Why doesnt the content provider just provide the dtd. Why have to worry about caching it or random errors poping up in it, when the DTD can be stored on the very same server as the website, or stored with the application. Then it doesnt matter if another company screws up or if some miliscious hacker decideds to attack the DTD it doesnt effect your product...
    Some might think well what if it changes?
    well its obvious download the new one update your xhtml/xml or application to the specific changes.
  • Just flush XML and then it wouldn't be an issue...
  • by knorthern knight ( 513660 ) on Friday May 18, 2007 @12:58AM (#19174947)
    1) There are some sensitive environments (military, etc) where you simply do *NOT* connect your internal network to "teh interweb". No ifs, ands, ors, buts. The result is a broken browser where the DTD's are required.

    2) Remember the incident where popular "safe" Superbowl sites were compromised and laced with malware-installing code? What happens to millions of Firefox-on-Windows users when a bunch of Russian mobsters or Chinese government agents hijack a DTD host and load it with a zero-day Windows exploit?

    3) Remember "pharming", where DNS servers are hijacked to redirect *CORRECTLY TYPED URLS* to malware-infested sites. Even if the bad-guys can't hijack the DTD host, they can still hijack Windows-based DNS servers (ptui!) and anybody who relies on them gets redirected to a malware-install site.

    That's the problem; here's my solution. It's composed of two parts.

    A) DTDs will be *LOCAL FILES ON YOUR WORKSTATION* (excepting "thin clients").

    B) Browsers (or possibly Operating Systems) will include new DTDs with updates. In posix OS's (*NIX, BSD) DTDs will be stored in /etc/dtd/ and users will be able to add their own DTDs in ~/.dtd
    Windows will have its own locations. When you get your regular update for your browser (or alternatively, your OS), part of the update will be any new DTDs. There will be a separate file for each DTD and version, so that your browser can properly handle multiple tabs opening to sites using different versions of the same base DTD.
  • Why, the same organization that should probably be responsible for *all* critical Internet infrastructure standards, just as it is responsible for the standards relating to telecommunications and radio communications.

    The ITU [wikipedia.org] (also here [itu.int].

    Go ahead, laugh, but I think it's long past time for control of such functions as DNS, NTP, assigned numbers, et cetera, to be transferred out of the hands of primarily US-based corporations and loosely coupled organizations such as the IETF and IANA and into the hands of som
  • What organization would be the likely custodian of such data?


    Is it not obvious that it may as well be the W3C? XML is their standard, operating a registry for public-use DTDs would be a rather reasonable service to provide..

The rule on staying alive as a program manager is to give 'em a number or give 'em a date, but never give 'em both at once.

Working...