Tools for Publishing in Multiple Formats? 63
Truist asks: "What are the best tools (windows or *nix) to use to publish a single source document in multiple formats, specifically plain text, multi-page HTML, and PDF? I'm trying to publish a (60-page+) NetBSD installation guide/documentary online, and I want plain text for easy download and 'less'-ability, HTML for easy browsing and search engine indexing, and PDF or Postscript for easy printing. It's currently a Word document (I know, I know - I'm happy to manually convert it to something else) with multiple styles, including regular text, lists, internal links, external (web url) links, code, and notes, and I'd like to preserve as much as possible of each in the final output. Some additional notes: there are no graphics, and I expect to update this document periodically, or to split it into parts and maintain the parts (think master document / subdocuments). It won't be updated too often, but if re-publishing could be scriptable, that would be fantastic."
Docbook docbook docbook (Score:1)
Since most of the docs are out of date and talk about stupid and near-impossible to configure tools, I also mention xlmto to do the actual conversion.
Re:Docbook docbook docbook (Score:2, Informative)
The problem with DocBook might also be considered its strength - basically it was designed by a committee, and evolved several humps. Each influential party behind it pushed the features that they wanted to see into it. Each individual feature set is a pretty good coherent package which will let you create documents just like [insert-project-name-here]'s own documentation - pretty neat! However, the different feature sets clash _hor
Re:Docbook docbook docbook (Score:2)
But as for it being a comprimise beteween different output formats... Well, ya. Thats kinda the whole point.
And its not Docbook that sucks, its all the old documentation for it that points users to stupid, hard to configure, to down right broken and non-functional tool chain. The near volumes of (shitty) instructions on DSSLCrap this, XSLTblargh that, SGMLSuperKalaFragalisticBroken other thing can all be summerized with:
xmlto.. If you dont have xml
Re:Docbook docbook docbook (Score:1)
But as for it being a comprimise beteween different output formats... Well, ya. Thats kinda the whole point.
"""
You misunderstand - I mean it's a hotch-potch of different styles, and half of the list types clash with half of the other tags as they were injected into the standard by some big OS project that wanted lists just so. Someone else wanted a family of admonitions, and they have a different look to them. Someone else wanted to have BNF grammers just-so, and that style clashes with the other text.
A
Docbook.. (again) (Score:4, Informative)
I have seen a variation of this question at least two times posted here. The unanymous answer is usually docbook and in this case is more relevennt, since the document is technical in nature.
good pick is DocBook: The Definitive Guide [amazon.com] written by Norma Walsh (who chairs the Oasis DocBook Technical Committee) and published by O'Reilly that. Of course the book is also available in HTML, PDF and plain text [docbook.org].
Re:Docbook.. (again) (Score:2)
Re:Docbook.. (again) (Score:3, Informative)
The XML style sheets (XSLT) that invarrably happen to come with 'a docbook distribution' are not a component of that standard. Your free to change them at will.
How you do that, I haven clue 1 beyond 'edit the .xsl' Im sure ORA has a book or 10 on the subject.
Re:Docbook.. (again) (Score:3, Informative)
Re:Docbook.. (again) (Score:2)
Latex? (Score:3, Informative)
You're in luck! (Score:3, Informative)
texi2html [cvshome.org]
Docbook+FOP (Score:3, Informative)
The subject says it all. Apparently, it's the standards-based, open-source-conforming way to do it. I've heard paeans sung to FOP [apache.org] but I haven't used it, yet.
5 illegal Indian Senior IT consultants (Score:1, Funny)
OpenOffice? (Score:3, Interesting)
OO and scripts: (Score:3, Informative)
Re:OO and scripts: (Score:1)
OpenOffice could be the answer (Score:3, Informative)
The other solutions presented so far suffer w.r.t # 4 - document maintenance. After all, if someone created their document in a visually rich editor like Word, it was probably because of ease of use an
Re:OpenOffice could be the answer (Score:1)
The XML format for MS Word is not very useful because it is highly restricted who is going to be able to use it, it will cost too much m
Re:OpenOffice? (Score:1)
Not Open Source... (Score:1)
Aside from the scriptable part, InDesign seems to be able to do all that you are asking for.
My portfolio is held in an InDesign document, which i have routinely saved out to HTML, PDF, printed etc.
It supports basic HTML code, CSS, and all the links you could want. It carries the links into the PDF as well if that is what you chose in your output.
Just don't expect anything fancy from your HTML, but if it is only text then no problem.
Also, i am pretty sure it imports word docs, but i am not sure as
Re:Not Open Source... (Score:1)
Re:Not Open Source... (Score:1)
DocBook, and some reasons... (Score:3, Interesting)
The conversion was a PITA, but once that was finished, we had about 40 source XML files which were independently version-controlled, some minor customizations to the standard DocBook XSLT stylesheets, and slick, easily-updated HTML, plain text, and PDF versions of the document being produced straight out of CVS by a cron job.
A nice benefit of the conversion was that we were actually able to add another few hundred pages of documentation that was automatically generated from grammar definitions and source code to the batch build, and they could be integrated into the style and distribution methods we worked out for the hand-generated docs.
XML (Score:2)
Might not be easy to set up at first, but it should work fantastically.
Re:XML (Score:1)
Re:XML (Score:1)
XML is to Docbook as SGML is to HTML. You wouldn't write web pages in SGML, so why write documentation in XML?
If you were to write your documentation in XML, then you would need to define a meaningful DTD/schema and all the tools that go with it to make it useful.
But why bother when someone's already done the hard work for you? eg. Docbook [docbook.org].
Lyx (Score:3, Interesting)
Re:Lyx (Score:2)
You can use LyX without ever knowing anything about latex, but for conversion, you've got to deal with a few issues.
PDF output is nice. Postscript/PDF format is what latex is all about. But, HTML output via latex2html isn't very great. It's functional (for the most part), but is a pain to customize, and in my opinion, not professional enough.
LyX does have some nice features, and I
I like tex (Score:5, Informative)
To some extent the texinfo folks have solved this problem as well. The DocBook stuff mentioned elsewhere might be very nice but I have no experience with that.
I like Lyx (Score:1)
I think it can import word.
The Near-Definitive Solution (Score:4, Insightful)
All documentation is edited using an ordinary plaintext editor.
The documents are marked-up using ReStructured Text conventions. [sourceforge.net] This has satisfied 99% of my needs. I've decided the convenience of ReST outweighs the need for the remaining 1% of the frills I want.
I use CVS [cvshome.org] for revision control. There may be an RCS involved in the backend; I don't operate the server that hosts my repository.
The ReST documents are converted to XML using DocUtils [sourceforge.net]. The project coordinator, by the way, has proven himself a superlative programmer. DocUtils rocks, and will also transform ReST to HTML or Latex.
The XML is converted using XSL templates that I've created. Saxon [sourceforge.net] then transforms the DocUtils XML to XML:FO, and FOP [apache.org] transforms that into PDF.
Pretty fucking spiffy, if I do say so myself.
I also currently use HT2HTML [sf.net] to transform ReST to HTML. I use it in preference to DocUtil's native HTML transformation because it allows me to do a few nice tricks. In the future I plan to migrate entirely to another set of custom XSL tranformations.
This system has proven extremely productive. At any time I could pop a few bucks for a commercial XSL:FO->PDF engine and stomp the few gripes I've had with FOP (my number one issue is lack of keep-with-next functionality; however, FOP is under a complete refactoring, and will emerge with full functionality). Saxon has been superb, DocUtils has been wonderful (and I've been able to contribute to the overall design), and ReST is quite pleasant to read and write.
Overall, I highly recommend this workflow.
Your source material becomes extremely reusable, eminently accessible, and free from commercial encumberances.
(footnote: if you do go this route, please don't flood the DocUtils developers with suggestions and ideas. Work out your idea in detail, consult the developers' mailing list archives, and make full consideration of side-effects. Only then suggest it. They've been at this so long, and had so many discussions, that they've become a little short of patience with loud-mouthed newbies. I suspect most popular open-source projects get that way...)
Re:The Near-Definitive Solution (Score:2)
How's that going for you?
We've had no luck, and Apache's image examples are currently broken, which is not giving me a good feeling...
Re:The Near-Definitive Solution (Score:2)
It has been necessary to significantly increase the memory allocations for FOP. The current command is
(WinXP) java -Xms64m -Xmx256m -Xss64m -cp "%LOCALCLASSPATH%" org.apache.fop.apps.Fop -c "%LOCAL_FOP_HOME%conf\userconfig.xml" %1 %2 %3
(Bash) "$JAVACMD" -Xms64m -Xmx256m -Xss64m -classpath "$L
Check out the FreeBSD Documentation Project (Score:1)
What I tell you three times is true... (Score:2)
Docbook is a flexible, configurable, way to do just what you're asking. You can change output formats - produce PDF or RTF or HTML or Latex or text. You can parameterize it, and script it pretty easily. There are already Docbook to filters available and you can adapt them to other uses with a bit of poking around.
texinfo 0wns docbook (Score:5, Interesting)
I've tried doing the same for docbook and it plain sucked. While the DocBook format itself is nice, the tools for transforming are too complex (for me?), esp. if you want to customize conversion to HTML or PDF. This definitely goes for DocBook/SGML, and by what I've seen so far DocBook/XML too to some extend.
Thus I'd rather say "texinfo", at least unless someone comes up with a foolproofed suite of tools for DocBook->PDF+HTML.
My $0.02.
- Hubert
Re:texinfo 0wns docbook (Score:2, Informative)
Re:texinfo 0wns docbook (Score:3, Insightful)
I couldn't agree more. Docbook is pretty slick, but turning your Docbook source into a useable format is ridiculously hard. And what's more, chances are fairly good that when you are done it won't look good. PostgreSQL, to cite an example, actually dumps their Docbook to RTF and then edits it in RTF before creating their Postscipt and PDF files. What's more, despite the fact that they released 7.4 yesterday they don't have 7.4 documentation ready to go because they can't get the docs to build.
Texinfo,
it's all about latex baby! (Score:1)
use latex.
after creating your document in latex, you can use a simple makefile to create all your formats.
xml (Score:2, Interesting)
I've seen some posts here on XML, but most seem misleading. I've found that the most expressive and most flexible format is manual XML -- as in, your own dialect.
That is, you define your own tags, and define what they mean. Then you create stylesheets to convert them to other things. Because the original XML contains your intention, not the eventual formatting, it makes it easy to convert, or to make broad, sweeping changes to presentation (as presentation is detached from content).
The simplest examp
Straight XML, seriously? (Score:2)
Say you're writing a bulleted list with 100 items. In pure XML that's a minimum of 2 tags per item, plus tage for the list. Now put the first word of each item in bold and the rest of the line in italics. This is pretty basic formatting... but you'll have to edit that list while wad
OpenOffice. Really. (Score:2, Interesting)
However, if you are lacking lisp-fu and absolutely must have a GUI-based
WYSIWYG editor, OpenOffice may be a possible solution. You'll have to avoid
workflows that result in creating styles with meaningless names. (For example,
you can't just highlight some random text and start formatting it in various
arbitrary ways. Instead, define your styles properly with names using the
style catalog, and then apply your named styles to blocks of
Re:Straight XML, seriously? (Score:1)
The writer has already offered to manually convert it.
What we really need are better tools for editing XML. For me, I'm fine already with my 100 wpm + typing speed and pretty XML color scheme for vim, making it very easy to read.
Have you ever used CuteHTML? It's proprietary, stupid, and yet has one genius feature -- as you start to type an html tag, it provides a drop-down list of all tags it knows of to match yours. What we need is something like that which can match evolving XML code -- learn as yo
A doc's end destination is formatted text (Score:2)
Sure, if he has to, he will. Any solution that doesn't require that work has at least one good thing going for it, though.
And I agree that better XML editing tools would help the pure-XML solution, but there are serious limits to what's possible especially when you're using your own invented tag set. The tool can't check to see if your DDL is well-designed. It can't represent the tag data in any other way (like showing the bolded text instead of th
On a Mac it's easy... (Score:4, Interesting)
If I was doing this on my Mac I would create a script to, in order: Save my Word file as Plaintext, Save it as HTML, Print it as PDF (OS X can print to PDF from any and all applications), use the ColorSync Utility to regenerate the PDFs with your desired compression settings, then use an HTML cleaner such as HTML Tidy to eliminate all the crappy MS HTML markup. With Applescript it's a point and click operation to create the script, just hit record and go through the motions described above, hit stop and save as a droplet. You can drag and drop any number of Word docs onto it when ever you need to 'publish'. You could add an FTP action or save to an iDisk as part of the workflow just as easily.
The only thing you have to worry about is some of Word's [table] markup as it seriously blows when you try to convert to normal html.
There are plenty of tools for XML/XSLT transforms that could be scripted as well but it could be overkill... or maybe not.
If you had a Mac it would be easy.
Re:On a Mac it's easy... (Score:2)
plain roff (Score:2, Informative)
Here is a concrete example. I create a roff file rwlock.man as the source. Say I want a postscript doc, then I add the following to a Makefile.
rwlock.ps : rwlock.man
groff -man rwlock.man > rwlock.ps
This uses GNU troff, on other systems you might use the troff included with your system and pipe through dpost.
If I need a pdf fil