Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror
×
The Internet

Using the DocBook DTD for Internal Documents? 58

Saqib Ali asks: "These days, most of the Linux Documentation is created using DocBook DTD. I was wondering if it will be useful for a large Enterprise to create Internal IT documents using DocBook DTD. Any success stories where a large enterprise converted all of its internal IT documentation to DocBook, with management's support? Any other things/issues to keep in mind before embarking on such a mission?"
This discussion has been archived. No new comments can be posted.

Using the DocBook DTD for Internal Documents?

Comments Filter:
  • by quinto2000 ( 211211 ) on Friday October 11, 2002 @10:42AM (#4431979) Homepage Journal
    I was looking into doing this for a while with a number of the formatted documents my school needs to deal with. It turned out that the DTD was much more complex than warranted for the kind of stuff we were doing, but of course YMMV.
    • I was recently looking heavily at DocBook/XML and comparing it to (La)TeX. I found all the tools for docbook completely lacking, and the XML format to be completely unfriendly to actually writing. LaTeX on the other hand, seems to kick ass for writing, since the markup is short, sweet and easy to learn/use. Not to mention that the algorithms used to perform the layout were designed by a damn genius instead of mere mortals. I've now used LaTeX in conjunction with pdflatex and latex2html to use a single set of source docs to generate both a web site and a PDF file (not to mention that you could also crank out postscript or just about anything you might need to do with documentation... TeX was designed so Knuth could write computer books after all).

      DocBook, on the other hand, has a lot of complicated markup-- I mean who enjoys using the PARA tag to open and close each and every paragraph? It would drive me insane. Then, after you finally find an editor that suits your needs, you still have to monkey around trying to convert the documents. I was able to get a DB file into HTML without too much pain, but PDF? Never managed it. I spent too many hours on what essentially would have translated the DB XML into TeX source anyway! Why not just write in TeX and be done with it.

      Finally, there is LyX for LaTeX which looks to be a WYSIaboutWYG editor, although I find it very convenient to just use emacs. I think the only problem I've had so far is getting figures to lay out on within text how I want, whereas TeX is pretty happy shoving them later, so that the body of the text can remain as fluid as possible. You can see the results on my site [ichimunki.com] (where I suppose I ought to include a tarball of the actual LaTeX source files and the simple shell script that drives all the processing).
      • Your site.. (Score:3, Interesting)

        by Fweeky ( 41046 )
        http://www.ichimunki.com/ [ichimunki.com]: (pretend I have <cite> around that ;)
        ``Deep'' linking discouraged because the page names are dynamic

        Ignoring the utterly braindead ``foo'' quotes, those filenames are ultra lame.

        DocBook lets you specify a section ID which ends up being mapped to a filename when generating HTML; doesn't LaTeX haeve something like that?
        • Re:Your site.. (Score:2, Interesting)

          by ichimunki ( 194887 )
          The quotes are an artifact of LaTeX, which I'm sure could be easily removed by tweaking the latex2html script (it may even be an option)... however, as you see them, they are being strictly translated from the correct inputs to get left/right quotes in TeX, which then ensures they look right in printed copy. I'll have to look into it. While it doesn't bother me much, obviously it bugs someone. :)

          And yes, there is an option to have the resulting .html files have better names, but latex2html does not have any provision to prevent name collision-- so I opted out of it in this case (not that I needed to worry about that, so your point is valid and I will change that). LaTeX (like DocBook) has a facility for both regular names for chapters/sections/whatevers and a place to put abbreviated names (for use in places like tables of contents, references, and headers). The filenaming in latex2html does not use this, but rather a set number of words from the title (IIRC).

          In a perfect world, I'd like to see a system that combined the best of Wiki, TeX, and DocBook (I have nothing against XML, I just don't know if I'm in love with DB's DTD yet), so that you could have the pages be fairly interactive for online references (especially useful in a corporate setting), but still generate standalone documents from the entire work. All with complete revision control, of course.

          I settled on what seemed to be the best compromise available so that I could have a single set of source files produce both printed matter and a website. Ultimately the possibilities with XML seem greater (via stylesheets and xsltproc and custom document parsers written in languages like Perl or Ruby), but getting from XML to PostScript or PDF is the part I had problems with. I like to think if I had problems with it, so would others. But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.
          • Re:Your site.. (Score:3, Informative)

            by ttfkam ( 37064 )
            But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.

            Check out Apache Cocoon [apache.org] and Norman Walsh's DocBook stylesheets [sourceforge.net] at Sourceforge. It sounds very much like what you are looking for both for batch processing of documents (using command-line mode) and for online dynamic presentation. There is even a serializer to PCL5 in case you ever wanted to send directly to HP-compatible printers.
            • If you don't really care about sticking with the arcane tags and XML syntax of DocBook, you might want to consider anoter option that will get the job done easier and quicker: HTMLDOC.

              This is a really clever program that allows you to take a regualr web page and produce very nice PDFs (or PostScript) from it. It supports a few new tags that let you do things like page breaks, headers/footers and such that always should have been in HTML (even if only as a hint for printing) but wasn't. It automatically builds tables of contents (fully clickable in the PDF), cover pages, and the like, too.

              I've started using this tool more and more often over the last few months. It's just too handy for words. You can find it at Easy Software [easysw.com]. (And yes, it's open source.)
              • As a matter of fact, I don't have a love affair with XML syntax per se. As for arcane tags, I mostly use Simplified DocBook which has such arcane tags as <article>, <author>, <section>, <title>, and <orgname>. It's different from HTML, yes, but no more arcane (and I think rather less) than <ol>, <li>, <dd>, <img>, and <hr>. If you want to talk about arcane, at least call a spade a spade.

                You have a solution and you seem to like it. My problem with it is the mixture of content and layout. Bold, italics, strikeout, and underline have no intrinsic meaning: they are visual cues for underlying themes. When they are the only model, you by definition lose the semantic background to the document. "What's the problem," you ask? For static HTML and PDF presentation, there is no problem for human readers. But it removes the possibility of automated, intelligent indexing and categorization.

                ...another option that will get the job done easier and quicker: HTMLDOC.

                I retort with Yoda. "Is the dark side stronger?" "No! Quicker. Easier. More seductive." :)
          • The quotes are an artifact of LaTeX, which I'm sure could be easily removed by tweaking the latex2html script (it may even be an option)... however, as you see them, they are being strictly translated from the correct inputs to get left/ right quotes in TeX, which then ensures they look right in printed copy

            Well, you can specify left and right quotes in HTML:
            <!ENTITY ldquo CDATA "&#8220;" -- left double quotation mark,
            U+201C ISOnum -->
            <!ENTITY rdquo CDATA "&amp;#8221;" -- right double quotation mark,
            U+201D ISOnum -->
            latex2html should either use them, <q> tags, or normal double quotes. Not abusing backticks (note they don't look anything like the mirror of ' in an awful lot of fonts, including ones very popular online, such as Verdana).

            Doing it TWICE to emulate double quotes means the author of latex2html is going to hell for sure (along with 1001 online newspaper editors) :)

            And yes, there is an option to have the resulting .html files have better names, but latex2html does not have any provision to prevent name collision-- so I opted out of it in this case (not that I needed to worry about that, so your point is valid and I will change that)

            Ah, yes, that's better. Better to be dependent on title and have some meaning than be dependent on order in the table of contents and have little meaning :)

            In a perfect world, I'd like to see a system that combined the best of Wiki, TeX, and DocBook (I have nothing against XML, I just don't know if I'm in love with DB's DTD yet), so that you could have the pages be fairly interactive for online references (especially useful in a corporate setting), but still generate standalone documents from the entire work. All with complete revision control, of course.

            Yes, that would be nice. WebDAV with a versioning backend like SubVersion has some potential for document management - better imo than the million-and-one forms approach.

            Document formats are a little more hairy. Sometimes I feel like using something like AFT, which is pretty close to plain text. Other times I want to use XHTML, or DocBook, or my own schema. Some front-end which handles all of them would be nice :)

            I'm not really bothered by print, but I do want my documents to be stand-alone from the website. I want navigation elements to grow dynamically from the metadata in my documents, or from some external metadata file. I also want to be able to generate documents from databases etc and have them plug in nicely with the filesystem and keep nice abstract and stable URI's.

            Unfortunately I'm pretty sure I'm gonna have to write this myself. Being a professional slacker, this will likely take a while :)

            But then I limited myself to Free Software, whereas someone willing to use non-Free software might easily find an off-the-shelf package to get around the PS/PDF hurdle.

            DocBook to PS/PDF isn't too hard. If you can find a generalised XSL:FO engine you would be able to use an arbitrary XML document provided you have a stylesheet for it. Failing that, a browser, CSS print media rules and an option to print to a PS file would probably be ok; converting PS to PDF shouldn't be a problem, and CSS can style any XML document you like.
      • You've obviously use LaTeX quite a bit already. That's hardly a fair comparison. You compare something with which you are already comfortable with something you haven't used at all before.

        As far as markup goes, one of the reasons for using the open/close tag pair in XML was because so many people have written HTML and are used to that model.

        As for complicated markup, there is a Simplified DocBook [oasis-open.org] that reduces the amount of elements you have to know and keep track of while still remaining 100% DocBook compatible. Write a little now, and as your experience and comfort grows, so can your markup choice. Simplified DocBook now, full DocBook when the volume of documentation requires it later (By that time, more editors will have come out hopefully).

        DocBook to PDF is handled by converting to XSL:FO (not to be confused with XSLT) syntax and serializing with something like FOP. LaTeX is actually closer to XSL:FO than to DocBook. If you're trying to convert to PDF by hand, you're expending more effort than you needed to. You can find premade stylesheets for HTML and FO [sourceforge.net] and documentation about how to use them without reinventing the wheel. The advantage of going to XSL:FO instead of a direct DocBook-to-PDF is that there are serializers out there to output FO syntax to PDF, PostScript, PCL5, and RTF. It would be a shame to just make a one trick pony.

        As for emacs, there are emacs extensions written for DocBook [oasis-open.org] that help you with tag choices and automatically close the tags for you. Isn't that one of the main complaints you had about the syntax? And you're comfortable with emacs, right?

        Note that you are using LaTeX to drive the layout. This is not how to use DocBook. In fact, DocBook goes out of its way to avoid any layout information in the file. Say you want to search for all documents with a section title that contains "apple". Anyone with a document parser can implement this no matter who wrote the DocBook file at any organization. LaTeX you could do this as long as everyone agreed upon the element identifiers -- which doesn't happen at every company. DocBook is content, HTML and PDF are layout, and never the twain shall meet...except during the transformation step.

        If you prefer LaTeX, peace be with you. But they cannot really be compared as LaTeX -- while possible in implementation -- does not enforce a disctinction between semantic content and layout presentation. DocBook does. This adds some complexity for the initial startup sometimes, but it pays off when you actually have to organize and index those documents in an archive. You should talk to the folks at the Linux Documentation Project for more insight on this.
        • I had not used either tool at all before I did my investigation. What I found after extensive searching (which included both of your links) was that the tools out there for making PDFs from DocBook rely on LaTeX anyway (and to top it off, I was unable to decipher the proper usage thereof, whereas the tools to go straight from LaTeX were quite easy to find/use). It would be trivial to build a latex2docbook convertor should there eventually be compelling reason to switch. FWIW, I don't think your characterizations of LaTeX are accurate. In actual practice, an author need know nothing aobut layout and style to produce useful LaTeX documents, and LaTeX documents are easily machine parsed for things like which sections have titles containing the word "apple".
          • by ttfkam ( 37064 )
            DocBook -> XSL:FO -> PDF

            XML processed with XSLT and serialized through FOP. Where is LaTeX used? XSLT doesn't have anything to do with LaTeX and FOP has nothing to do with LaTeX. Where do they rely on LaTeX?

            Oh! You were talking about the LaTeX converters that Norman Walsh made available. Sorry. There's the confusion. If you use the FO stylesheets and FOP or iText for the PDF serialization, things are much much simpler. LaTeX shouldn't come into play unless you really want to use LaTeX.

            And you are right that it is quite possible to make layout-free LaTeX. My statement was only that it does not enforce the separation of content and layout. This is the same as saying that there is nothing stopping a programming team from making clean, readable C with uniform indentation of code blocks, but Python doesn't allow the choice: clean, uniform indentation is an intrinsic piece.

            It was not my intention to say that LaTeX made it impossible or even unduly difficult. Sorry for the confusion.
            • I guess I will look a little more carefully for stylesheets using FO to get to PDF-- since that was my only real sticking point. I like TeX, but long-term it doesn't seem like the ideal solution to this problem.
    • by ttfkam ( 37064 ) on Friday October 11, 2002 @12:58PM (#4433102) Homepage Journal
      There is also a Simplified DocBook DTD [oasis-open.org]. We used it at my last job. It is a small but useful subset of DocBook that can get you started.

      All Simplified DocBook files are also completely valid DocBook documents. But there are far fewer elements and constructs to keep in your head. It's also geared toward smaller items such as articles instead of complete books. At my company, we made a couple of template documents and then just had people fill in the blanks. People ended up working faster once we got them to stop worrying about formatting and styling (non-trivial).

      Start writing in SD and as the collection of documents grows, you can look into combining them into a cohesive DocBook collection as time permits and your experience level grows.
  • by Boiotos ( 139179 ) on Friday October 11, 2002 @10:42AM (#4431982) Homepage
    uses Cocoon2 [apache.org] as a web-publication engine. The Norm Walsh xslt sheets [sourceforge.net] are your best general-purpose transformation, but they sometimes choke on Xalan. This Wiki Page [outerthought.net] should clear up that problem.
  • ...or not. YMMV to a very great extent. I have tried to do it, and I liked what was coming as a result (almost) except being the only one in the group doing that was not much of a help. The greatest problem was interchanging docs with others. RTF stylesheets are ok and can be used, but...

    Check out NTSGML pages (though they have not been updated for some time) if you end up doing this all under Windows. Also, I'd recommend sticking with generic SGML, not XML -- RTF converters for XSLT are not that good (I was not able to produce a single readable doc).
    • don't tell him to use the SGML version. New development around DocBook is definitely centered around the XML variant of DocBook. As for RTF, I recommend using stylesheets that convert to XSL:FO and serializing them to RTF with something like jfor.

      In my opinion, XSLT should not be used to generate something like RTF directly. XSLT was made to transform one XML schema to another. Period. Anything else is like trying to put the square peg in the round hole.
      • In my opinion, XSLT should not be used to generate something like RTF directly. XSLT was made to transform one XML schema to another. Period. Anything else is like trying to put the square peg in the round hole.

        That is what I used. Problem is, I guess, that I was trying to do it under WinNT and there may have been a few quirks that just would not let it work fully. For one, jfor would nver produce anything anywhere resembling what was expected.


        Another annoying thing was that I actually had to run a web server on my lap top to be able to generate anything: all the tools (except, I think xsltproc) were very insistent on going to OASIS website to read latest & greatest DTD! Maybe again, I ahve missed something, but I could not persuade neither saxon, nor xerces/xalan to use local copy of DTD...

        • by ttfkam ( 37064 )
          Yes, this isn't documented well enough. (I'm not being sarcastic -- it actually took a bit of hunting to find this)

          From http://xslt-process.sourceforge.net/docbook.php

          A better solution is to create a local copy of the DocBook DTD files. To do this go to http://www.oasis-open.org/docbook/xml/4.1.2/ and download the ZIP file containing the DocBook DTD. Put it in an accesible place on your file system, for example in /usr/local/share/docbook-4.1.2. Then modify the DOCTYPE of your DocBook documents to be:
          <!DOCTYPE book
          PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN"
          "file:/usr/local/share/docbook-4.1.2/docbookx.dtd" >

          I also know that there's a way to specify it as a general resource and to have a catalog that keeps from having to hardcode each file to a path, but I don't remember the syntax or the steps offhand.

          Hope this helps with your laptop problem.
          • This was what I logically concluded myself as well, and tried. I had a 'proper' URL, i.e. 'file:///path/to/DTD' -- maybe that's what was wrong, because still it did not work.
  • by scrytch ( 9198 ) <chuck@myrealbox.com> on Friday October 11, 2002 @10:53AM (#4432064)
    But the structure navigator in every single bloody XML editor I have ever tried, free or commercial, tends to look like this:


    book
    |
    +--chapter
    +--chapter
    | |
    | +--section
    | +--section
    |
    |--chapter


    ad nauseum. Not chapter titles, not section titles, the literal words chapter and section. Multiply this by hundreds of sections.

    How. Completely. Useless.

    Until I can find an XML editor with some bloody sense to its structure navigator, I would rather use word. And no, I don't really want to use a WYSIWYG editor, because I want to know what XML it generates for my custom xslt snippets (which I might add I also have similar problems navigating with these brain dead editors)

    • The problem is with the way XML works. Unless your XML editor only handles a limited set of document types (eg DocBook and HTML only), it doesn't know where to find chapter titles, section titles, etc. Is it <chapter title='foo'>? Or is it <chapter> <title>foo</title>? Or something completely different? Unless there's a standard way of marking up the titles, your editor has no way to extract the titles from the document for you.
    • Look at how the Linux Documentation Project [tldp.org] handles SGML/XML files. There are ways of handling this a lot better.
    • A title is a part of an article or chapter just as the content is.
      book
      |
      +--title
      +--chapter
      | |
      | +--title
      | +--section
      | |
      | +--title
      | +--para
      | +--para
      |
      +--chapter
      |
      +--title
      etc. This is because the editor is strictly following the DocBook schema. Chances are that the editor authors wanted their editor to be schema-agnostic. If your comment is saying that editors should be better, I wholeheartedly agree.

      As for wanting to know what the underlying XML is, "why!?!" For something like Word, where only formatting information is saved, I could see your concern. This is like the HTML output of Frontpage and Dreamweaver. But DocBook is a semantic construct with no formatting information. What you see in a GUI should be far less variable in the output data below.

      With DocBook, you already know what code snippets it is generating without even looking at your editor; it's rigidly defined in the DTD. Your XSLT should be written to the DTD, not to a document.
  • We did it. (Score:5, Insightful)

    by Some guy named Chris ( 9720 ) on Friday October 11, 2002 @11:10AM (#4432175) Journal
    It was a nightmare.

    Anyone who was not a programmer balked at the idea of having to write documentation in a (Gasp!) markup language. "Just give me Word!" they would whine.

    There is a lot of overhead associated with DocBook that most non-technical people don't want to deal with. They want a WYSIWYG editor, and will cry, kick, scream, and intentionally be completely unproductive until they get it.
    • intentionally be completely unproductive until they get it.
      There is no place for someone who is deliberately not doing their job. Discipline them. They should use legitimate channels to handle their problems, not act like spoiled two-year-olds.
      • "There is no place for someone who is deliberately not doing their job."

        Except a union apparently.
        • Re:We did it. (Score:1, Offtopic)

          by aridhol ( 112307 )
          And this is why I believe that unions are too powerful.
          <RANT>
          Where I used to live (Victoria, BC), janitors in the hospital got paid more than the medical staff. Why? Because the medical staff were considered an essential service by the government, and therefore not allowed to strike. Because the janitors were not considered essential, they were allowed to strike, and therefore drove up their pay rates.
          Unions were useful in their day. They eliminated harsh working conditions. Now the government performs that task with laws, and unions have become superfluous.
          </RANT>
          • Re:We did it. (Score:1, Offtopic)

            by duffbeer703 ( 177751 )
            The problem isn't the Union -- it's the government.

            The government created the Canadian health service, which in turn made it impossible for medical workers to negotiate via collective bargaining.

            The "lazy union worker" image is just that -- an image pushed by business and the media. And while some things, particularly senority systems and the greviance process, seem very strange and wasteful, they are there because employers like railroads, meat packers, health services and school boards screwed their employees in those areas.

            I expect anti-union attitude amongst IT staff and programmers will change as their jobs are rendered obsolete by automation and cheap competition.
          • Re:We did it. (Score:1, Offtopic)

            by jslag ( 21657 )
            Unions were useful in their day. They eliminated harsh working conditions.


            Read Fast Food Nation [amazon.com] , and then either 1) reiterate your claim, explaining how a slaughterhouse as described by Schlosser doesn't constitute a harsh working condition, or 2) refute the factual evidence presented about slaughterhouses. Hint: no one in the meat industry has been able to find factual errors in Schlosser's account.


            There are plenty of other examples, of course, that's simply the first that comes to mind. Harsh working conditions exist, and industry has figured out how to work with government to prevent safety regulations from being implemented.

      • Re:We did it. (Score:1, Insightful)

        by Anonymous Coward
        Maybe instead of delivering broken, pie-in-the sky applications, IT should deliver working application to the user community.

        An accountant should not have to right in DocBook or any other markup language.

        Use a WYSIWIG editor and translate it to DocBook.
      • There is no place for someone who is deliberately not doing their job. Discipline them. They should use legitimate channels to handle their problems, not act like spoiled two-year-olds.

        You know, I normally find your posts pretty thoughtful, and I often agree with them. But this time I think you're way off the mark. "Discipline them?" If you treat people like children, you shouldn't really be surprised if they act like children in return, should you?

        General-purpose computers are great things because they allow people to use the tools they find most effective to get the job done. In this example, what's the job? Producing documentation. (The submitter was talking about internal documentation, but the OP was talking about docs in general, evidently.) To produce documentation, you should use the tool that's best suited for producing documentation, not the one that looks coolest on paper or that has the neatest feature set or whatever.

        Writing structured documents in something like LaTeX (with which I have some experience) or XML (with which I have less) works well up to a point... but only up to a point. If your document is going to be basically prose-- unformatted paragraphs organized into sections, chapters, and books-- then writing with a markup language will probably work well. The ratio of content to markup will be small, so you can just concentrate on your words.

        But if you want to create even something as simple as a bulleted list, suddenly you have to deal with markup. Creating a bulleted list in Word is trivial; you click the "bulleted list" button and go to town. Creating a bulleted list in LaTeX or XML is more work, and it scatters markup throughout your document in an unappealing and unpleasant way.

        So markup works in some situations, but in others it's not a good solution. This is what we should be talking about here. Not talking about disciplining coworkers who "act like spoiled two-year-olds."

        I just think you're forgetting what the purpose of computers and IT is: to give people the tools they need to do their jobs. Any system that requires its users to work in a way that they're not happy with is flawed, and could be improved somehow.

        (Sorry about the rant.)
        • I agree with you. However, if management had dictated that you must do something in a given way, you have the following options:
          • Talk to management about it. Follow your company's procedures for bringing up issues.
          • Propose another method (works well in conjunction with the above)
          • Deal with it.
          • Get a different job
          • Deliberately avoid doing your assigned tasks
          Do whichever of the above you want. However, if you chose not to work, don't be surprised when your employer chooses not to pay you.
          • I don't think the world is as black-and-white as you think it is. Within certain boundaries, I think the world does-- and should-- work just as you describe. But outside the envelope, things aren't as predictable. I think IT's requiring technical writers to use XML or LaTeX (which is a contrived example, of course) would be unreasonable. When you put unreasonable demands on people-- people who are just trying to do their jobs, by the way-- it's pretty likely that people are going to respond unreasonably.

            My opinion on the whole matter is that people should use whatever tools they like to do their jobs-- to the extend that it's practical for them to do so. XML might have some technical merits over Microsoft Word, but if the writer wants to use Word, that's his call.

            But that's just my opinion.
            • I think IT's requiring technical writers to use XML or LaTeX (which is a contrived example, of course) would be unreasonable.

              Erm. . . How would this be unreasonable? How is it any more unreasonable than expecting a programmer to use language 'foo' for an application the company is developing? If the technical writer wishes to get paid, then they need to do their job, and that means doing what their employer tells them to do. If that includes using XML or LaTeX, then they either do their job, or find a new job.

              When you put unreasonable demands on people-- people who are just trying to do their jobs, by the way-- it's pretty likely that people are going to respond unreasonably.

              I'll note again that I don't think mandating a specific way of writing things is at all unreasonable. If these people are trying to do their jobs, then they'll do it as their told to. . . that *is* their job. Being told how to do your job is not an uncommon, nor unreasonable, thing (within reason of course, micro-managing is a Bad Thing (tm)).

              My opinion on the whole matter is that people should use whatever tools they like to do their jobs-- to the extend that it's practical for them to do so. XML might have some technical merits over Microsoft Word, but if the writer wants to use Word, that's his call.

              That's a great idea, but what do you do then, when you've got 10 different content authors using 11 (One of them got annoyed halfway through a project, and decided to try something new) different frameworks to develop their writing?

              And worse yet, what happens when you decide to combine two different authors' works into a single work, when they've both used different tools?

              While I'm all for people being allowed some individual choice in how they do their job, there is a limit that has to be considered. If they're working for a company, that company gets to decide both what they need to do for their job, and how they need to do it. If the company standardizes on a single format, such as XML/DocBook, or LaTeX, or HTML, or whatever it is, then everyone at that company should be using it. Regardless of whether they'd rather be using something else, they're being paid to do what their employer tells them to.
              • Erm. . . How would this be unreasonable?

                It's unreasonable like carving a roast beast-- er, sorry, too much Dr. Seuss-- carving a roast beef with a screwdriver is unreasonable. If the person doing the job finds the tool inappropriate, maybe the mandate should be reconsidered.

                I'll note again that I don't think mandating a specific way of writing things is at all unreasonable.

                Ah, but that's the thing. Mandating the use of XML for technical writing gets in the way of the job. If you're spending time tweaking document structure in an obscure language, you're not writing.

                All I'm saying is this: you will almost certainly gain more efficiency and productivity by letting your people do their jobs with the tools they prefer than by requiring the use of any one tool, not matter what its technical or political merits might be.
                • It's unreasonable like carving a roast beast-- er, sorry, too much Dr. Seuss-- carving a roast beef with a screwdriver is unreasonable. If the person doing the job finds the tool inappropriate, maybe the mandate should be reconsidered.

                  Or perhaps the person doing the job should realize that no job is perfect, and at some point they're going to have to accept some restrictions from their employer on how they do their job. At least, if they want to get paid. ;-)

                  Your analogy of carving a roast with a screwdriver doesn't really hold up, because most of the things we're discussing here, LaTeX, XML, etc, were specifically designed for authors. A better analogy would be that you are carving a roast, and need to pick a knife. LaTeX would be one type of knife, while XML/DocBook would be another type.

                  Just because someone doesn't like the knife they were given doesn't mean that it's the wrong knife. They may just be ignorant of it. Or it may be that the company is standardizing on a single type of knife so that it can more easily share the knives among employees.

                  Ah, but that's the thing. Mandating the use of XML for technical writing gets in the way of the job. If you're spending time tweaking document structure in an obscure language, you're not writing.

                  Have you ever used XML (Assuming that we're specifically talking about DocBook, as that was designed specifically for use by authors, particularly technical writers)?

                  DocBook/XML was specifically designed for creating documents and books. Additionally, XML is not an osbscure language, nor very difficult to work with. Espcially in this age of the Web, everyone is familiar with HTML, making DocBook fairly easy to pick up. As if that wasn't easy enough, there are numerous XML editors available that can make it even easier to work with.

                  Unless all writing is done in plain text, you will have to deal with some work to make it presentable. Whether that be in a word processor, in LaTeX, in DocBook, whatever, it will have to be done. The question that has to be asked is which format will provide the greatest benefits with the fewest detriments. Depending on the goals of the company, the individual authors may very well not be the best person to make these decisions.

                  All I'm saying is this: you will almost certainly gain more efficiency and productivity by letting your people do their jobs with the tools they prefer than by requiring the use of any one tool, not matter what its technical or political merits might be.

                  Ah, but you're looking at this in a very limited way. Yes, you may gain more efficiency in the short run, by individual authors, by letting each person use whatever they want. But in the long run, you could end up spending literally 10 times as long making the end product meet the company's needs.

                  It's easy for an individual person to look at the situation and say, "I could write this document in only three hours if I could do it in 'foo', but doing it in DocBook/XML will take me four hours", and think that it would be much more efficient to write it in 'foo'. But if this individual is writing a single article that will be combined with four other articles into a single work, and it will take six hours for someone to combine the five differently formatted articles into that single work, then collectively, you've just lost an hours worth of work.

                  And no, this isn't a purely theoretical example. At a previous employer, we had a situation like this occur. Eventually, we standardized on a single framework for all technical writing and documentation. At first, it did slow people down a little bit, as they were forced to learn the new system. Once everyone became used to it, though, it worked *much* better than before. Being able to easily share and merge documents allowed us to create a single, central, information repository, easily accessible and usable by everyone.

                  Lastly, while you throw out technical merits with a single statement, it's not something to be overlooked. Depending on what your end goals are, you may *need* to consider technical merits in order to get the job done. For example, if your end result needs to be available as a PDF file, then you better be using tools that support PDF generation. If you're not, then no matter how productive you might think you are, you're never going to get your job done. Sometimes it's more important to fit your tools to your job, than to fit them to a specific person.
                  • Listen, no offense, but your comment is pretty damn long, and I haven't spent any time thinking about this subject in days and days. So I'm just gonna skim, okay? Seriously, no offense intended.

                    ...most of the things we're discussing here, LaTeX, XML, etc, were specifically designed for authors...

                    That doesn't sound right to me. LaTeX is a typesetting system, not an authoring system. The distinction is subtle, but important. I've had many jobs in my life-- mostly 'cause I have a short attention span and I keep getting fired-- and along the way I've been a typesetter, a programmer, and most recently an author. Putting on my typesetter hat, LaTeX rocks. It's a fantastic typesetting system, all praise be to Knuth and Lamport. But as an author, it's definitely not optimal. If I want to italicize a word-- something authors do a lot-- I have to type {\it whatever}. That's not author-friendly. XML is far worse. XML, in my author opinion, isn't really meant to be human-readable. It gets in the way of the words, and as an author, words are all that count, you know?

                    So LaTeX and XML are really awful systems for authors. With all the tools at my disposal, I still find myself using Microsoft Word with a very narrow set of predefined styles for creating structured documents.

                    ...everyone is familiar with HTML...

                    See, the thing is, this simply isn't true. Technical writers-- that's what we're talking about here-- come from two basic camps. They're either technology people who become writers, or they're writers who write technology stuff to pay their bills while they work on the great American novel on weekends. Programmers and geeks-- I use the term reflexively and affectionately-- are familiar with HTML. Writers aren't, and don't particularly want to be. Asking people who just want to write to scatter XML markup through their documents is like trying to teach a pig to sing: it wastes time and it annoys the pig.

                    ...there are numerous XML editors available...

                    If you're going to have your writer or writers using a tool anyway, why not just let them use the one they're already familiar with? Why try to force a new one down their throats just because it produces XML?

                    And while we're on the subject, don't bother taking your XML documents to a printer to get typeset and published. The print world-- actual ink on paper stuff-- cares about traditional stylesheets, from Word or Quark or FrameMaker, not XML.

                    ...the individual authors may very well not be the best person to make these decisions.

                    Agreed. But guys from the IT department sure as hell aren't qualified to make the call, either. Compromises will have to be made, and that involves getting input and feedback from your people rather than simply dictating to them.

                    Look, let's talk about the real world here. In the vast majority of cases, technical writing goes from the writer to page layout to the printer. In many cases, the process of printing the documents might be supplemented-- or even completely replaced-- by the creation of PDFs, but the process is the same. Writer to page layout to printer.

                    Most page layout gets done in one of three pieces of software: QuarkXPress, Adobe FrameMaker, or Adobe InDesign. If the page layout is done with FrameMaker, then the best thing is for the documents to have been written with FrameMaker. This is fine, because FrameMaker is a good tool for writers. Most writers at least know it exists-- unlike LaTeX or XML-- and if they're not familiar with it, they can learn their way around it in minutes thanks to the familiar UI-- unlike LaTeX or XML. But FrameMaker is falling out of favor in many circles, replaced by either QuarkXPress or InDesign. In either of those pieces of software, the layout artist has to import text documents provided by writers and flow them through design templates. Layout artists like their jobs to be easy; the best possible scenario is if the documents provided by the writers can drop right in, and arrange and format themselves based on previously designed stylesheets. Can you do that with XML? No. Can you do it with LaTeX? No. Can you do it with Microsoft Word? Hell, yeah, easy as pie. So which tool is the best for the job in that situation?

                    Now, there are exceptions to this rule. Until recently, one major UNIX systems vendor I worked with still produced all their documentation using troff on UNIX workstations. I don't know what tools O'Reilly uses for layout, but I understand that theirs is an all-UNIX workflow as well. Of course, O'Reilly prefers that its authors submit Word or FrameMaker files, so that just goes to prove my point.

                    It's late, and I'm tired. Let me just wrap this up by saying this: go find a technical writer. Explain to her (most of the writers I know are women; this may or may not be typical) that you want her to do all of her writing using LaTeX or XML from now on. Explain to her what this means. After you get out the hospital, come tell me how it went. I'll be interested to hear.
    • Re:We did it. (Score:3, Interesting)

      by pete-classic ( 75983 )
      How about WYSIWYM (What You See Is What You Mean)?

      Try LyX [lyx.org].

      Just click "title" and type the title. Click a button to turn italics on/off, etc.

      See http://bgu.chez.tiscali.fr/doc/db4lyx/ and http://www.lyx.org/help/xml/xml.php

      -Peter
  • No Suitable Editors (Score:4, Interesting)

    by GOD_ALMIGHTY ( 17678 ) <curt.johnson@gmail.NETBSDcom minus bsd> on Friday October 11, 2002 @11:33AM (#4432323) Homepage
    Essentially your choices are Adobe Framemaker [adobe.com] (~$800), Lyx [lyx.org] (Open Source) and XMLmind [xmlmind.com] (Freeware). There may be some others, but these are the ones I've looked at. These are the ones you can use like a WYSIWYG, but are more WYSIWYM (What you see is what you mean). For more info on WYSIWYM, look at Lyx's site.

    DocBook is a great spec, but the editors suck for the most part. Lyx can't import DocBook in reliably, and your Docbook is stored as a lyx file (latex I think). Lyx's Docbook stuff can be a bear to set up, even on a system like RedHat where most of the software comes installed. I only recommend Lyx to people who have experience with Lyx, to someone who just wants to write docs, it tends to be more trouble than it's worth.

    Framemaker will probably do everything you want and be a godsend with lots of nice features, but you'll pay for it, $800 for Win/Mac and ~$1300 for Unix.

    XMLmind is pretty cool, it does Docbook well but is a little slow, it has a little bit of a learning curve, but is prolly the best Docbook editor I've found for free. It's not Open Source though. It is written in Java, so you might have some speed issues, depending on the platform you run it on. I've been recommending XMLmind to everyone I know that asks about Docbook, it has a tree view of the DOM as well as a WYSIWYM view with stylesheets applied on the fly. It has property editors and a pretty smart insert tool that follows the DTD, only allowing you to insert allowed tags into other tags. It feels like more of a programmer's tool than Framemaker, but it should be fairly easy for most WYSIWYG users to adjust.

    <rant>
    I don't understand why on God's green earth OpenOffice or Abiword or KOffice, or anyone else in the OpenSource world has neglected this area. It's been three years since the LDP went to DocBook, GNOME uses DocBook as their doc format. Why in the hell don't we have decent document writing tools when everyone is always screaming about the lack of documentation in the OpenSource world?

    If we want more docs written, it needs to be easier to write them and shouldn't involve learning all about SGML or XML engines as well as a markup language to do it. DocBook is too big to keep in my head and I shouldn't have to think hard about how to write docs when my focus is the content I want to write for. Organizing technical info on a difficult subject is hard enough, stopping every five minutes to look up a DocBook tag or trying to better understand the structure is a huge barrier to getting the work done.
    </rant>

    But that's just my $.02
    • by Anonymous Coward
      You may want to take a look at "Tagless Editor" from www.i4i.com [i4i.com].
      • Wow, I've been looking for a tool that does that for a long time.

        What is really unfortunate is that, even if you somehow convince people to use this tool, once they discover that <citation> produces essentially the same formatting as <image_caption> (or whatever two tags), then they'll either use the two interchangeably for whatever, or they'll use one or the other exclusively for things that are unrelated to citations or captions. Nobody except programmers cares at all about document structure, and you can't force them to. All people want, and all they'll think about, is pretty layout.

        (rant mode off)
        • What is really unfortunate is that, even if you somehow convince people to use this tool, once they discover that <citation> produces essentially the same formatting as <image_caption> (or whatever two tags), then they'll either use the two interchangeably for whatever, or they'll use one or the other exclusively for things that are unrelated to citations or captions.

          I think that most documentation people can understand such distinctions. To drive the point home better, use different styles for each -- at least while they are editing. You can do this with the WYSIWYG editors such as Morphon -- just use a different color for each. Or you could create preview stylesheets out of the standard Norm Walsh templates.

          • I think that most documentation people can understand such distinctions. To drive the point home better, use different styles for each -- at least while they are editing. You can do this with the WYSIWYG editors such as Morphon -- just use a different color for each.

            No doubt the best documentation people understand this, but in my experience, most either don't understand it or don't care. And if you enforce the difference between types like this, then what they see is ugly, and it'll be nothing like what they eventually get. This, reasonably, makes them resistant to using the software.

            Actually, in my experience, most people working on documentation were dragged there from something else they'd rather be working on, and often even have to be shown such advanced concepts as copy and paste. Therefore creating documentation should be really easy, but worrying about structure just isn't easy. Making this bold, and that italic is easy, though. This problem won't be solved until we can create heuristics that just figure out what you mean when you make a block of text such-and-such a style, or at the very least can separate the "styled text" part of a document from the "containing layout" part, and can reliably extract the important styles from the ones that change between presentations. Either that, or every company hires expensive professional documentors.

            Part of the problem, I think, is that many people who work on documentation were trained on typewriters or desktop publishing software. And though those have justifiably gone out of fashion, nobody except programmers is interested in learning what they see as the paradigm of the week.

      • This i4i thing looks pretty cool. I must say I like it (well atleast the demo looks promising). People hate change. If they fid out that this all works within Word, they would be willing to atleast give it a try....
    • There are some tools (proprietary, unfortunately) that will convert from RTF to an XML format, which can be easily docbookified (upCast, at infinity-loop [infinity-loop.de], is what we use, since we have a Tomcat environment).

      As for producing docbook natively, the NetBeans [htpt] java IDE has an XML module that is pretty slick. Good 'ol X?Emacs in PSGML mode is what I use to create and edit Docbook on the fly, it works really well (although the indentation engine is pretty flaky). Those are both open source.

      Abiword supposedly can save in Docbook V 4.1.2 XML format, but its output filter leaves a lot to be desired the last time I checked. OpenOffice's native format is XML, so a set of XSL stylesheets is all that's needed to Docbookify it. We may be working on developing just such stuff over here.

    • Why in the hell don't we have decent document writing tools when everyone is always screaming about the lack of documentation in the OpenSource world?

      Because good editors are hard to write and a vast majority of the sufficiently talented coders who could do it still don't grasp the concept of content being separate from layout. You can't code what you don't understand.

      That coupled with -- what other have touched on -- users who can't accept that what they edit is not necessarily what it will ultimately look like.

      "I want to put this in italics."
      "Why?"
      "Because the image captions should all be in italics here."
      "So put the text in a <caption> tag."
      "But it's not in italics in my editor."
      "It will be in italics when it's published."
      "But it's not in italics in my editor."
      *sigh*

      You're right, we need better editors.
  • by Anonymous Coward
    help.unc.edu [unc.edu], UNC-Chapel Hill's technical support website, uses Docbook (and XML) extensively. The publication framework is Cocoon 2 under Tomcat, but I'm sure if you like Perl you could use Axkit too =)
  • Are there any good classes/school/online courses where the document writers can learn to develope DocBook based content. I have been writting for Linux documentation project for a while now. And I learned by looking at other XML/SGML content created by other people or machines. Is that the best way? Convert your existing non XML document to XML and go through it? I found that very useful in the begining. Any comments?
  • by stonebeat.org ( 562495 ) on Friday October 11, 2002 @01:53PM (#4433624) Homepage
    At one point in time I was very involved in OpenOffice.org. Now I have lost track of the developement. There were some talk to including DocBook DTD in the distribution. Does anyone, if any progress has been made on that?
  • So far we've completed converting 3 of our "books" from Script to DocBook. The largest book being over 175 chapters with about 600 pages. The most time consuming problem was the project requirements were that the DocBook version must look very similar to the Script version. We used the XSL stylesheets from docbook.sf.net [sf.net] and FOP [apache.org].

    Script is a formatting language (think RTF) and DocBook is a markup language. There was a lot of inconsistant formatting in the Script versions which decreased readablilty. The consistant formatting of correctly marked up DocBook is a very good thing.

    I spent a lot of time customizing the XSLT stylesheets. XSLT has a nice mechanism that allows you to import and then overide parts of the imported stylesheets. This is real nice because we can upgrade the upstream style sheets without modifing our customizations. This isn't completely true if there are big structual changes to the upstream stylesheets but since our changes are in seperate files it's rather easy to refit our customizations.

    We had two people working on this project. One customizing the stylesheets, me, and another who took the Script source and added DocBook tags. This worked quite well. We were commited to the project and were able to stick with it until completion. This worked very well.

    I encouraged another department to give DocBook a try and this didn't work so well. They currently only publish their interal docs to HTML and their documentation source was written in HTML. For them the overhead of DocBook and their lack of desire for paper output made it not worth it for them.

    Previously we could only print to paper. Now we have a single source to generate HTML, PDF, Paper (from pdf), and Windows Compiled HTML Help files (basicly HTML with extra meta info).

    Some people seem to just not understand the advantages of marking up the structure of the document instead of the formatting. If you want to use DocBook because of the hype then odds are you'll piss people off in the short time, maybe long term too, by forcing it on them. If you and management understands the long term advantages of structed documentation then I really recomend DocBook.


  • for all the reasons stated above and...

    i was unable to produce a simple Howto document (bulleted list) because the docbook.xsl file had error(s).

    when i reported these to the author (?) i was ignored.

    now over a year later i'm kicking myself for not finishing my version of what docbook should be: doc-this! [doc-this.com]

    i have been asked recently to finish this so i guess maybe it's woth the effort.
    • Norm Walsh has been very repsonive in the past, at least to my requests. My guess is you were not using the stylesheets correctly or your xslt proc sucked.

      DocBook Rocks!
  • My father was hired by a publisher to translate some chapters of a physics book. They provided him with a few copies of the book along with theses instructions:

    Don't use any formatting when writing your text, no bold, no italics, nothing. When there's a figure, place [FIGURE ##] where ## is the number of the figure. I repeat, do not do any formating, we won't accept your document if it's formated.

    I'm pretty sure that they we're taking this unformated text and transforming it into docbook.

    So you may want to do this: ask your non-technical people to write unformated text, and hire a technical person (programer) to do the markup.
  • A common mistake in the wysywig paradigm is pre-mature markup. People get slowed down making sure their masterpiece looks right (or worse fighting with the fsking tool), when really writing isn't related to how it looks - it's communication. Talk to any real writer, and you will probably find they use a plain format (paper, typwritten, textfiles, plain word docs).

    Markup should always happen /after/ the writing itself. My personal approach is to use a text editor, and then some simple custom scripts to convert it's obvious format into pdf, html/css, xml, troff, etc. The biggest win is I never fight with my editor, and I can concentrate on writing. And, I can export to any format I choose - though I do have to write the filter.

    At work when doing professional documentation, our layout people extract the raw text and apply to their own Framemaker setups - so all the formatting our developers do is really in vain. The doc dept. has no trouble with my plain text stuff ;-) I've even extracted some of it using filters to simplify their life more.

    Docbook itself is fine - but make life simple for the writers, don't make them think about markup (as much as possible anyway). My vote is on the plain-text editors + filters ... but word docs and the same can work, though the tool tends to get in the way of thinking about communicating.

    My CDN$.02.

God help those who do not help themselves. -- Wilson Mizner

Working...