Alternatives To .DOC As Standard WP Format? 205
D. C. Sessions asks: "I'm on the Software Task Group of a standards body (JEDEC) which is, among other things, responsible for the DDR memory standard. You may have heard of it. Currently standards drafts must be submitted in an editable word processing format, which right now is interpreted as FrameMaker or MS Word. I find not only offensive, but dangerous that these standards -should- outlive the current MS software that can manipulate them. I've gotten some sympathy on 'bit rot' from the rest of the committee based on showing what current flavors of Word do to documents saved with older versions, but the problem is this: What do I propose as a replacement?" Two that come to mind right off of the top of my head are LaTeX and, of course, HTML. Any other formats that can work just as well as .DOC in most situations and are cross-platform to boot?
"It should (obviously) be an open file format, preferably with an open source tool to access it. It absolutely must be usable on LoseBlows, should be usable on Mac, and (for my own sake) on Linux and Solaris. It must be capable of structured documentation, numbering, tables, and embedded vector graphics. I just don't know of such a beast at present."
TeX, RTF, XML/SGML about it (Score:1)
To put it bluntly.. other than ASCII (7bit) you are going to lose some people, mangle documents, and just about have someone complain about not being able to see stuff. That sadly is the cold hard facts of life..
I have found that Microsoft found out that the desktop killer app was word processing and killed anyone else in it
Happy holidays
Re:One word... (Score:1)
PDF is better. Smaller, better font embedding, better cross platform support, incremental download/display for web download etc.
But, much as we hackers love to edit postscript by hand, normal people don't. The original message said editable
No. (Score:4)
XML may be a way out, but there's no XML-based document format on the horizon. (I don't know about this Open E-Book stuff, though.) All in all, the OSS community has failed to provide an open, flexible document format that could compete with MS Word. I'm as unhappy with that as you are, but if you want to change it, all word processor developers must get together and formulate a standard. Is this ever going to happen? Note that most closed-source word processors want to bind their users to their product by using a proprietary, closed format.
--
We Want A Standard *File Format* (Score:1)
Re:Reverse engineer the thing (Score:1)
Too right! I used to try too keep up with .doc 2 .txt (or rtf or html) translators, but I've just given up. Now if I receive a .doc attachement, the sender gets an angry "This is a non-Microsoft site, please resend you mail in a readible format." To one persistent re-offender I sent as an attachment the output of cat /dev/urandom > urgent.doc.
Re:Postscript.... (Score:2)
LaTeX, the way to enlightenment! (Score:1)
Re:.RTF could have been it ... (Score:1)
All my Pentium motherboards have headers on them for USB, and that means motherboards from long before the imac came out. They're easy to plug a cable/connector onto, and voila! USB on all my PC systems.
The zeal of Mac users continues to amaze us.
Re:ASCII (Score:2)
--
Re:.RTF could have been it ... (Score:2)
I seem to remember that MS developed RTF as a way to exchanging documents between Macs & PCs. As the original poster stated, MS has changed RTF quite a bit over the years, usually to follow the changes that they've made to Word. But at least the changes have been documented and are available on the web. A quick search with google will turn up several of the RTF specs. Most word processors that I know of will support RTF and there is at least one open source word processor (Ted [nllgg.nl]) that uses RTF exclusively. I've used it and it's pretty good.
what's wrong with PDF? (Score:2)
i've asked this so many times in threads like this, but i always seem to get in too late to get any responses. i'd like to ask once again what's wrong with PDF documents?
it's my understanding the PDF is an open format. in fact, i've even heard that part of the reason why Apple used DisplayPDF in MacOS X is because they would have had to license Postscript from Adobe while PDF was royalty-free. if this is the case, why is it that opensource advocates hail Postscript, but denounce PDF?
i know that PDF is the format when you want to ensure that pages are printed correctly. that being said, they're still able to store text-content, they're compressed so they're a resonable size, and they're cross-platform (lots of programs can read PDFs these days, not just Acrobat)
now for the topic at hand, i understand that standards definitions would be best presented in a format that doesn't waste so much space on presentation: content is what should matter, which is why a Framemaker file format or XML might be best. but for casual documents, why don't we use PDF? it's surely a lot better than DOC.
so i'm asking: what's wrong with PDF? why can't more programs write to PDF as an export option? why can't more programs read PDFs for editing? am i missing something here?
please, somebody knowledgeable: enlighten me.
- j
Re:I like RTF the best. (Score:2)
Latex is the right tool for another job ... sorta (Score:2)
There is however a windows tex word processor who's name escapes me -- but it reads / writes tex files as its native format and allows you to write latex files in a interactive format, which IMHO is alot better then the "edit and compile" paradigm ...
This would really be the best of both worlds ... the unix heads can have their programming language latex, and the windows-bred secretaries can have a program that they can work with as well ...
But you still need to be able to _use_ Word! (Score:2)
All the posts arguing for TeX, DocBook, XML, Star Office or Pathetic Writer are forgetting that a group that demands submissions in .doc format is obviously receiving them from people using Word and turning them over to other people using Word. Forcing everyone to use LaTeX or XML (or to write LaTeX or XML in Word) is a guarantee that the whole thing will grind to a halt.
HTML is an option; XML is not until Microsoft adds it as a "Save As" option.
Re:XML and SVG (Score:2)
In my own little start-up project, I am in desperate need of an Excel-file format replacement. I am contemplating over XML, but besides being a lousy programmer I am even worse at reading specs...
But anyhow seems to be coming. My work will abandon development of Java GUIs (on top of C++ programs) in favor of XML GUI! That way any program can be called from within browsers, without the need for platform specific virtual machines!
My suggestion is: go for XML!
formats become obsolete (Score:1)
Perhaps it's time to actually have some standards body define a standard format for word processing, that presents an acceptable minimum of functionality. The cries of XML! XML!, while partially missing the point, as XML itself isn't up to the task, might be a start, since, at least in theory, an XML-based format would be both extensible and maintain backwards compatability, and have the added bennefit of being relatively easy to write implementations of.
Of course, opinions ( ie (_)*(_)s )what exactly constitutes the minimum acceptable functionality may vary but, as you know, committees are good at making sure that nobody is any happier with the results than anybody else.
Why doesn't such a standard exist already? Simple, no company wants to write a stable, open spec. for a minimal document, and if somebody were to attempt to do so other companies would not likely give it acceptance. This is why some committee, be it an organization, such as ISO, or a group of 'interested parties' agreeing to work together would be the best situation. This is probably outside of JEDEC's charter, but y'all may be able to pass the suggestion onto the appropriate parties.
So, in closing, such a spec would need to be:
The IETF standard is ASCII (Score:2)
The Internet Engineering Task Force [ietf.org] (IETF) publishes all its standards (the RFCs [rfc-editor.org]) for the Internet in American Standard Code for Information Interchange (ASCII). You can also submit the document in PostScript, but the ASCII is the primary reference.
ASCII is searchable, printable, indexable, and forward compatible essentially forevermore. Everyone can display it correctly, anywhere. There is no better format for any kind of International standard. The IETF debates the question of superceding ASCII as the standard format about every other year, but we've yet to identify any other format that has ASCII's advantages.
HTML might one day replace ASCII in this capacity, but it needs to be stable for longer than it has been, and the HTML generators out there never generate correct HTML (ever looked at web pages in iCab [www.icab.de]? It has a built-in syntax checker, and even slashdot comes up with errors, all the time). Until those problems are fixed...
LaTeX of course, and maybe XML (Score:1)
Well LaTeX files can compile into nice looking .pdf files which are viewable on any platform, plus they look exactly the same on every platform. Postscript also prints out very nicely and can be handled by just about any printer and platform. LaTeX is all I use for all my papers and documents I need to write.
There's also XML. I'm not sure how portable and consistant documents look using XML but it's supposed to be a portable document format.
Re:Hmm what about the obvious choice? (Score:1)
Unless today's word processors can load & save PDF as if it were their native format, I don't see PDF as a solution here.
OpenOffice is the answer (Score:2)
I think this would be a good place to start. To make it even more buzzword-compliant, the OpenOffice formats will be XML-based. I'd be very happy to see the OpenOffice formats adopted not only by Star/Sun, but also by Abi, Gnumeric, K-office... who knows, maybe someone could even write a plug-in for MS Office to load and save documents in OpenOffice format. (If it's successful enough, MS will eventually have to do it themselves.)
--
Re:Staroffice 5.2 (Score:2)
Microsoft makes this claim with each new generation of its office products. It has always turned out to be a lie [slashdot.org].
--
Re:Staroffice 5.2 (Score:2)
--
Something for the Non-Coders out there (Score:1)
Re:TeX is what you want. (Score:2)
TeX is a good idea. XML is probably better, and far more likely to actually happen. Of course, there's a zillion different ways that a paper could be stored in XML, so XML alone isn't the magic bullet. But it's a good start.
Re:SGML/XML/DocBook (Score:1)
Whats wrong with HTML? I think it should be used as a standard for document interchange. In fact, guess what! It already is! The prejudics you have against HTML seems to be based on some sort of beutiful idea of stylistic perfection. Well, I don't give a shit about that - HTML is here, its now, and it can be read by loads of apps and its an open standard. HTML is the solution to your portable document problems.You're reading this alright, aren't you?
KTB:Lover, Poet, Artiste, Aesthete, Programmer.
The answer is obvious ... (Score:2)
Re:Something for the Non-Coders out there (Score:2)
People working or large documents need tools and formats that focus on document structure. A bunch of very smart people looked deeply into the problem years ago and came up with the idea of markup languages.
ActuallyTom Swiss | the infamous tms | http://www.infamous.net/
MS has this cornered for a reason (Score:2)
There's a window to do something about this right now, because Microsoft is tightening the screws on Office 2000 pricing. The amount of money companies have to spend on Microsoft Office is about to increase substantially. Technically, documenting the StarOffice format very well and encouraging other efforts to use it would be a good way to get started on the problem. From a business perspective, VA Linux or Red Hat ought to try pushing a desktop distribution that takes one install and provides the tools needed by, say, 70% of office workers. (Hint: support the first few companies that try this in a big way, to find out what's needed.)
openoffice is working on this (Score:2)
Openoffice is seeking to address this. This may be of no consequence for someone needing a solution to today, but I thought I'd mention this.
the link is xml.openoffice.org [openoffice.org]. Draft formats are available for download, and you can follow the development on the mailing list there as well.
Staroffice (Score:3)
.RTF could have been it ... (Score:3)
-Martin
Re:Just a bit reactionary? (Score:2)
Nope, just trying to clear up some issues.
I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML"
I don't think it was implied. It was mentioned casually. But that wasn't my point. Choosing to use XML is like choose either a binary or text document format. Just saying "use XML" doesn't mean a whole lot. The format itself is really the DTD that's used. Whether or not writing a good DTD was implied, it is certainly a whole lot more complicated than the poster was making it out to be. XML is no magic wand.
How could it possibly be device-dependent? This is just text, we're talking about.
It's waaay too easy to make things device-dependant. For instance, think about printing a modern, full-featured HTML page. It is a device-dependant language; it's meant to work within a browser, of a certain size rage, with a certain colour depth, etc., etc.. It will look great in your browser, but it doesn't lend well to printing. So you have to choose your language/DTD carefully.
Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.
Okay, sorry. So, if you want to use XML, you'll need a good DTD, *and* a good XSL(or a good application).
Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.
Now, I'm only going to argue semantics on this one. "Easy" is subjective. You're right, it's easy to look into the document and edit it, but that doesn't make editing easy. I can easily look into a MS Word document and edit it. That doesn't mean I'll do anything useful, nor does it mean it'll be fast.
I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.
Well, you obviously know what you're talking about
Thanks for the reply, though
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
SGML/XML/DocBook (Score:5)
Use a nice SGML/XML application like DocBook. Tools for manipulation are free, anyone can write DocBook, with or without specialist tools (it looks a lot like HTML to the layman).
Don't use HTML, at least use XHTML making sure that you segregate style from content. If you must use HTML, use stylesheets so that formatting is consistent.
But, my recommendation would be to use DocBook (SGML) and use stylesheets and nice free parsers to output TeX, ASCII, RTF, HTML and whatever else people want.
Re:SGML/XML/DocBook (Score:2)
TeX doesn't meet that second requirement as much as I love it.
Re:It may seem incredibly redundant... (Score:2)
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
Re:Staroffice 5.2 (Score:2)
Try that with Word [97, 2k, 2.001k, etc etc).
Re:It may seem incredibly redundant... (Score:2)
The poster didn't answer the question that had been asked very well. They talked about XML as a good thing, but they didn't talk about the bad things(which you must know about when trying to make an informed decision). I was just trying to clear the issue up a bit. The bad things about XML being that you've got to write a good DTD, and good XSLs, etc., etc..
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
documentation isn't the problem (Score:2)
The problem with .DOC is the typical Microsoft problem: Microsoft beats other people to market by "just getting the job done". They hack up what needs to be done, if it works 90% it's OK, and maybe they document it later. They are even proud of that and seem to think it's the right way to go because they actually beat everybody else to market; let's hope the customers will wake up to this and stop buying.
The latest .DOC format is supposedly XML (with embedded binary). That will help somewhat, in that it will at least make the text and other important information accessible without a complex OLE infrastructure. But to take full advantage of the information found in .DOC will still not be possible. The .DOC format often contains scripting and all sorts of other extensions, and the actual semantics of those can depend heavily on the environment or a buried deep inside some MS Word module.
Note that inside Microsoft, there now seems to be another approach, NetDocs [cnet.com]. If it delivers what it claims to, a fully XML-based standards-compliant, end-user document and information management system, I have my doubts that that will catch on--it is way out of character for Microsoft.
tex (Score:2)
BTW, I wouldn't think of it as a replacement for the DOC format, I would think of it as doing things right from the start. Doc is good for what it does, but what I think you are describing is MUCH more suited to tex.
Re:It may seem incredibly redundant... (Score:2)
XML can easily (dare I suggest it, trivially) be transformed into XML documents - in fact, this is the approach my current employers take for a number of types of business documents - XML is the format for representing the data, and LaTeX or HTML or whatever can become the format used for making this available to the user - XSL transformations allow us to take a language-independent set of data and translate it into a document in a suitable format.
If you want true independence from propietrary data formats (and open source applications can have data formats that are just as restrictive as closed source applications to most users) then XML is the only real choice right now - a well defined XML document should be readable even *without* a parser, and with a well-defined DTD and a series of appropriate XSL files, you can select your own viewer application. What could possibly be better? Certainly not Word, StarOffice, LaTeX or any of the other competitors in this arena.
LaTeX and HTML (Score:2)
I think the best idea is something that is extra simple, and unlikely to change in the future; that is ASCII.
Let TeX die. (Score:2)
Re:Just a bit reactionary? (Score:2)
Granted. I imagine things will get easier as editors become more widespread that are geared towards editing XML documents. The editor could make sure you stay within the DTD, speed up the writing time involved.. Until then it's all being done by hand.
Well, you obviously know what you're talking about
Yeah, looking over the original post again, he probably should've been more clear. It sounds like he's been using XML for awhile, and forgot about the issues involved in actually learning it.
Been fun.
Re:SGML/XML/DocBook (Score:2)
I agree with you about TeX's stability. After using several different incompatible tools through the 80's for my resume, I finally put it in TeX and stayed with TeX for a decade. I'm considering HTML or XML now, but I haven't made the switch.
Re:It may seem incredibly redundant... (Score:2)
I think you are confused. LaTeX *is* designed with with generalized structural markup in mind. (OTOH TeX focuses on specific markup.) In LaTeX you use commands like \section and \chapter and \emph, and (generally) not layout markup commands like ``itallics'' or font sizes.
``LaTeX is, to a large extent, an example of a `generic markup language' (GML). Thanks to the class file mechanism, the visual style of the various document elements are described in a single place outside of the source document itself'' (The LaTeX Companion, 7).
I hope that clarifies things.
--Ben
Some Suggestions (Score:4)
Basically, what you want is a format the fits the following criteria:
1) The original text can be easily gotten out of the format. This way even if the programs that read the file go the way of the dodo, future programs could still recover the data.
2) The specification is fully open and documented, and preferrably stable and mature.
3) At least one open-source program handles displaying/converting the format. I would recommend storing a copy of this program in the same place as the standards themselves- including shipping source with standards CDs.
You've gotten over the hardest part already- you've realized you have a problem.
Brian
Re:postscript isn't editable (Score:2)
Re:Something for the Non-Coders out there (Score:2)
Re:No. (Score:2)
Re:SGML/XML/DocBook (Score:3)
HTML is great, XHTML (or at least HTML >= 4) is better.
The problem with HTML is that it was designed to be a markup language for simple documents, so it has heading, subheadings, titles, paragraphs etc. However, as people wanted to do more and more stylistic things with it, the language was extended by the w3c. But, most people kept just bastardising it by using heading tags to make things big and bold tags to emphasize things.
HTML is a big, nasty mix of structured document and stylistic tags. What HTML 4 strict does is to say that HTML is just a structure language with no formatting info. Then you use CSS or XSL to do the style work, which is a much more sane and portable approach.
Re:.RTF could have been it ... (Score:2)
Re:TeX is what you want. (Score:2)
There are a few problems with using TeX/LaTeX. The first is that TeX tries to do paragraph-by-paragraph layout, and often winds up in tight spots that it doesn't need to. The average user wouldn't have a clue about what to do with overfull hboxes.
Another problem is that it's not really possible to do WYSIWYG, and those people who use lots of spaces instead of tabs (even with variable width fonts, heh) will have a rough time adjusting to that. People will complain about things like "well in Word the line wraps this way..." BTW this is a problem with Word itself; it's figure placement is really screwy.
Finally, those of us in academia who write papers in LaTeX can no longer look down on those whose use of Word is obvious by the terrible aesthetics of their papers.
Obviously, there are lots of advantages, and for Microsoft, possibly the nicest thing about TeX is that there are no known bugs. (not that Microsoft will have any problem adding some...)
Re:Dangerous and Offensive??? What is standard? (Score:2)
My experience is the opposite. Where I work, Office 97 is the standard, along with Windows 95 and Windows NT 4.0. The company (Fortune 50 corporation) is conservative about upgrading to new versions of software. They don't want to spend money on new software unless there is a compelling reason to do so. I don't know anyone who has Windows 2000 or Office 2000 on their work PC.
Re:RTF could have. . . (I think it is!) (Score:3)
For straight wordprocessing where no layouts are required, it's great. It's ascii with the expressive power of italics and extended symbol recognition. For straight word smithing, that's all anybody needs.
Here's what I do:
1. Do all wordprocessing using a very basic text editor which saves very basic RTFs.
2. Import those files to whatever layout program is needed. (Quark, Pagemaker, whatever.)
The stability of RTFs lies in two areas; Firstly, there will ALWAYS be a selection of homemade editors available upon which to do your writing, and second, there is no financial incentive for big software manufacturers to make RTFs un-importable to their suites and layout packages.
This means that doing all your basic work in RTF will make files readable for a long time to come.
In any case, particularly in the print publishing field, today's software is finally about as good as it needs to get. There's no real reason to switch tools. Unlike with graphics technology, Times Roman simply doesn't need to motion blur and bump map for a writer to work his or her craft. As long as we all keep our old copies of Wordperfect and Pagemaker, everything is fine.
Fantastic Lad
Re:Staroffice (Score:2)
Re:Dangerous and Offensive??? (Score:2)
Till the next version of Word is released, then...it changes!
Re: (Score:2)
Re:Reverse engineer the thing (Score:2)
HTML has problems (Score:2)
2. A document can be rendered quite differently by different browsers.
3. You can't even get things like page numbers in HTML documents.
Re:It may seem incredibly redundant... (Score:5)
XML is nothing more than a concept - you store data and text within "tags". The tags can be of pretty much any name. The data can be anything. This isn't a standard, it's not even a format.
Basically, XML boils down to: store it in a text file, delimit data, fields, and content by tags. Sorry, that doesn't cut it. You have to do more.
No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.
So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript), it has to be: device-independant, easily rendered, easily edited, and extremelycomprehensive.
Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.
Dave
Barclay family motto:
Aut agere aut mori.
(Either action or death.)
No no no (Score:2)
Just a slight correction, the DoD standard for documents is the 2 latest revisions of MS's .DOC format.
Intro/Tutorial on DocBook? (Score:3)
I hadn't heard of DocBook, so I went fishing on docbook.org [docbook.org] for some basic info.
The state of the documentation for this product is fairly lacking. (Hey, it's a DOCUMENT application!) There's no "getting started with DocBook" stuff. There's no official tutorial.
The closest thing to a tutorial I found is this page: DocBook intro [lanl.gov]. I'll excerpt the front page.
Here is my tutorial on DocBook. I never completed it, but it is still useful, since others don't focus on a complete beginner tutorial.
Last modified: Mon Jul 27 11:19:57 1998
Frankly, this sums up my issue with many Open Source projects: making a technically superior tool is not enough to generate wide user acceptance. There has to be an easy migration path from what the user's already got.
DocBook needs at least ONE of the following to get people going:
RTF/DOC/FrameMaker/TeX to DocBook converters, supporting at least a good 75% of basic features,
A usable migration tutorial that assumes the user already makes RTF/DOC/FrameMaker/TeX documents,
A usable editor that shows the results, even if it has to be two-paned to show both source and results.
I'm not flaming Open Source in general, but this is not the first time I have heard of a tool that would fit my needs exactly, except they put very large barriers to entry in my path.
Re:It may seem incredibly redundant... (Score:3)
Re:Something for the Non-Coders out there (Score:2)
No. First, we start with unlearning past mistakes. It is often handly to have nice, solid piece of wood in your hands at this point, as we teach "You do not want to change fonts and sizes. You want to think about your document's structure and mark it up accordingly."
Yes, we don't have to beat that into "the average John and Jane Doe" or "the average secretary" who just wants to type up a one page letter, but when people are creating real documents structure should be in the front of their minds. Otherwise they're fscked from the start, regardless of technology choices.
Tom Swiss | the infamous tms | http://www.infamous.net/
Re:HTML...Niagra falls (Score:2)
Tom Swiss | the infamous tms | http://www.infamous.net/
Re-inventing the Wheel: A good idea?! (Score:2)
. . . . right.
I can see M$ going for this one right now. (HA HA HA!)
This means that the file format would have to be made a part of the public domain.
IANAL, but I think this would take a prodigious amount of legal wrangling.
I personally prefer a format like xml or html where you can see the tags, etc. and figure out what is going on, if someone made a mistake. Mind you, this is just me, just a personal preferance.
I also wonder about designing a file format for the future, given the various changes in technology. As an example, there is a new technology that has been demonstrated providing 3d displays in shocking detail, no special glasses needed. Not a Moving Picture yet, but you get the idea. How to incorporate this? The file format has to be scalable and adaptable.
MS word does really horrible at things like books, where it is better to use a page layout program like Pagemaker.
so it looks like we have to re-invent the wheel here, and include all of those features that make the best sense. Yet another Open Source project for the masses.
Don't look so enthuthiastic now!
;-)
LoseBlows (Score:5)
Hey linsux users - grow up.
Re:HTML...Niagra falls (Score:2)
I apologize in advance... (Score:2)
Certainly, you have to assign meaning to tags in order for data to be formatted correctly. The whole point of XML is that data carry traits and structure (which of course, can be inherited).
This is where the concept of a template would come in. I had mentioned this but you must have looked over it.
You have a set of rules defined that determine what certain tags do. Very similar to HTML now (table, p, b, div, etc. are all assigned functionality). With XML, these templates can even be a part of the document with tags that flag them as such. The trick is to put as little of this in the hands of the word processor itself.
I never said "XML! XML!" all by itself. XML is fairly abstract. Obviously we need everything that works along side of it and I'm talking about all supporting technologies if I'm talking about XML. If you read the article again, you'll notice the question was about document formats, not whether or not we'll need templates to go along with our XML formatted data.
Re:No. (Score:2)
\raisebox{-12.8mm}{% \setlength{\unitlength}{1truemm} \includegraphics[width=50 pt,height=50 pt,keepaspectratio=true]{logo2.bmp} }
you're right, but I don't. Positioning, scaling and using graphics within LaTeX is far from easy. And we don't have to discuss in which ways MS Word sucks -- it will never find its way onto my harddisk. (I personally use TXT, LaTeX, HTML and StarOffice, depending on the task at hand.)
The question is not whether something is possible in LaTeX, the question is how long it takes the average user to do it.
--
You miss my point; this doesn't *need* gingerbread (Score:2)
Ok, I'll bite.
Firstly, you seem to be missing the main point of the question. This isn't about finding a generic format for page layout - it's about how to best transfer specification documents so that they can be written anywhere and read anywhere. HTML works wonderfully for this.
Secondly, _yes_, you can do all of the above, when it makes _sense_ to do so.
Page breaks? Easy. Have a set of linked documents instead of one big page. This is useful under some circumstances (like dividing a large document into sections).
Chapters? Um, you _have_ the tools to emphasize chapter headings, and you _have_ page breaks if you really feel you have to use them. Where's the problem?
Graphics? If I need a figure, that's what the image tag is for. If I want to have anything fancier than an image in a box... then I should have someone else write the standards document. Again, we aren't making magazine articles here - the goal is to find a format suitable for a *technical description*. Visual gingerbread is _counterproductive_; it distracts the reader.
Headers/footers? Frames work fine for that, if you have a real reason to use them. I personally can't think of any, for this application. For my own documents, if I'm writing something that must be pretty, I use a script to prettify things consistently.
Re:No. (Score:2)
Show me an out-of-the box installation of Emacs for Windows that not only does decently install & configure the program without much user interaction but also gives you all the info you need to know to write letters, including an easy interface to select templates for common tasks.
No, you can't expect the average user to acquire this info by themselves. Emacs is even too much for a geek like myself.
--
Problem with Word? (Score:2)
Re:It may seem incredibly redundant... (Score:2)
I'm not sure why you include LaTeX in this list. I'm not sure which, LaTeX or XML, would be best for the proposed use, but LaTeX most certainly *is* readable even without a `parser.' The other aspects of XML and LaTeX are where the two formats differ but both are designed as structured markup saved in ASCII.
--Ben
Just a bit reactionary? (Score:2)
No, if you want to think about using XML for this, you need to talk about the DTD, not XML itself.
I think it's safe to assume that defining a DTD was implied. It's simply easier to say "Use XML" than to say "Write a good DTD to use with XML".
So, the question becomes, which DTD? In order to compete with the competition(LaTeX, HTML, PostScript)
That's just the point. It doesn't need to compete with other formats. The process goes something like this: Write a good DTD that fulfills all your needs, and allows for easy extension and specialization later on. Then, write XSL for exporting the format to whatever other formats are useful. HTML obviously for web display, PostScript for printing, maybe one for PDF, even. (Though encouraging the use of PDF probably isn't any better than encouraging the use of Word's DOC)
it has to be: device-independant, easily rendered, easily edited, and extremely comprehensive.
How could it possibly be device-dependent? This is just text, we're talking about.
Easy rendering has nothing to do with the XML DTD or document, that's the responsibility of the XSL that would accompany it, or the application that parses the document.
Easy editing is pretty straightforward. Just edit it. This goes along with comprehensive. A good DTD can be comprehensive, but it can also leave room for extension without breaking that document. It is, after all, the extensible markup language.
Don't shout "XML!!". XML, without a DTD, is almost useless, especially for this application. The DTD has to be all those things I mentioned, plus(for this application), it needs to be standard.
I wouldn't say XML without a DTD is useless, but I will say XML without a DTD is silly. It's a simple, logical assumption that if you're writing XML documents, they should have a DTD, so you know what's allowed. Like I said before, it seems like this would be implied.
Re:20 year-old problem (Score:2)
Microsoft
Matt Barnson
Re:Intro/Tutorial on DocBook? (Score:3)
WP formats (Score:2)
My two favorites: Rich Text Format (.rtf) and HTML. HTML has obvious advantages, but the disadvantage that it really wasn't designed for word processing as such. RTF was a format that, I believe, DEC came up with as a software-independent storage format for word-processor documents. I've found it does most everything needed to keep formatting and such intact, it's readable and writeable by most WP software ( MSWord, WordPerfect and StarOffice that I've confirmed by use ). It's also a plain-ASCII format, if you've no word processor you can pull it up in a text editor and get at the actual text if you really have to. And it hasn't had changes made to it in many years, stability is a definite plus for a long-term storage format.
XML and SVG (Score:2)
Tool support for this combination may not be so good or inexpensive, but you can be fairly sure the content will survive and be usable in many different environments.
Re:SGML/XML/DocBook (Score:2)
First, there's the aforementioned kluge of the HTML standards, but if one is writing documentation, he should stick to pure structuring (at least at first) anyway. If I write an entire document using <p> and <hX> tags, sure it'll be portable, 100% compatible with the W3C guidelines and so on, but there's more than that.
HTML, unlike many other more complicated mark-up languages, has poor support for "book" features. Headers, footers, generation of table of contents, page numbering, margins, cross-platform printing support. The list goes on and on, but if you're doing anything but looking at it in a browser, HTML is not a good choice for documentation.
So that's why HTML is not the best choice for documentation, not because of any grandiose "stylistic perfection" ideas. Furthermore, HTML is no more or less open than may other mark-up standards (e.g. SGML, XML, TeX), and they're all roughly on the same line in terms of portability (if you get the right tools, that is).
Basically, HTML makes a good "quick and dirty" documentation tool, but if you want your options open (wide open), SGML (or maybe XML in a few months) is the way to go.
LaTeX (Score:2)
The point is that you must use LaTeX if you want your work to ever appear respectable in print, so the question is: dose your publisher want to TeX it themselves from your draft or do they want you to TeX it, i.e. it's a question of money. If your an autonomous institution which dose it's own publishing and dose not have ass loads of money then you really need to make people TeX it.
Now, there are SGML systems which produce TeX and HTML, but they may not handle pictures propperly, so you should be very careful. Actually, there are ways to include hyperlinks in LaTeX. The resulting dvi file can be compiled to an HTML file. This is almost shurly the very best way to typeset your documents since you can write a TeX macro to treat images propperly for conversion to postscript OR HTML. It would work something like this: your images would be compiled to both
Jeff
XML is verbose and ugly, but it IS useful... (Score:2)
XML is pretty verbose and ugly. It's not the most convenient format to type in. But, in some sense, it finally extends the traditional UNIX approach to more complex data types. UNIX used to give you scanf, printf, and plain text files with fields. XML now extends that to parsing, generating, and transforming tree-structured types. That's really great, and it is really useful.
TeX is not what you want (Score:2)
Where TeX falls way short is in the way it is programmed and extended. The TeX processor is more like a machine language, with registers, lots of side effects, hooks, and global variables. Doing non-trivial transformations in TeX is incredibly hard, and even the best macro packages often don't get it quite right.
XML's approach is both more modern and much simpler: you describe transformations on the parse tree. XSLT and XSLT:FO are what corresponds to the programmable guts of TeX.
Most likely, what is going to happen is that many documents will be authored in XML, many document styles will be described in XSLT and XML Schema, and TeX will be used not for defining macro packages, but merely for performing the last stages of physical layout.
Re: SO, DocBook, RTF, and DOC (Score:2)
I'm a maintainer/lead coder on a couple of OS Office Projects: AbiWord (http://www.abisource.com) and wvWare (www.wvware.com). I've written quite a few import/export filters for AbiWord.
AbiWord is an excellent OS word processor which already handles lots of existing formats that you speak of: DOC, XHTML, DBK, RTF, et. al.... They each have their own mertits, advantages, and disadvantages.
XHTML is not a good layout language. It has all of the same problems that HTML and thus web pages have: i.e. WYSISYG formatting is next to impossible to achieve.
DBK is nice, except that DBK wasn't really meant to be a WP file format. Its tags carry with them lots of semantic information that WPs generally just don't care about. Its layout tags leave much to be desired for a WP. There just isn't a clean mapping of DBK->WP tags.
RTF is really slick (even though it is kinda old). Basically, anything that the AbiWord format can represent, RTF can do. This is a really good choice for your format.
ABW (or your WP's favorite native format) is always nice because it maps neatly onto your feature-set.
DOC support isn't really all that bad anymore. If you know what wvWare is (if you don't, see www.wvware.com), you know that it can convert DOC files into just about any format that you'd like. It can do this through either the command-line version of through the 50KB associated library. AbiWord uses wv to import DOCs. The importer is about 1100 LOC. I'm currently also writing DOC export support into wvWare (and thus AbiWord). Our DOC importer is *significantly* better than the one that OO has released. That will probably change soon, since Sun hired wvware's ex-maintainer to work on OO
Anyway, hope that this helps,
Dom
20 year-old problem (Score:2)
Framemaker speaks it, WordPerfect speaks it, I dunno about MS Word, and of course it can be pumped out into lots of other formats (eg HTML, XML, etc.)
It's not a perfect solution but it's widely availiable and fairly future-proof. Your specs should be about content anyway, let the reader concern themselves with presentation.
XML, DTD & XSL can be edited with some neat tools. (Score:2)
1) XML separates your content (XML) from your structure (DTD) from your presentation (XSL) leading to far more concise and rational documents.
2) They are open standards, unlike Word, FrameMaker or other proprietary formats.
3) The tools for document creation are open (and closed,) cross-platform and not dependent on the largesse of any single source.
"Save as HTML" is Your Friend. (Score:2)
The advantages: HTML is readable on any platform under the sun (and quite a few in caves), and most word processors can export using it.
If the documents have figures, they can be saved as one of
This is the only way I've found to get MS Word-users to give me readable documents, among other things.
Use DocBook (Score:2)
DocBook Resources (Score:3)
DocBook is your friend
DocBook is a lot to digest at one time, but it is well worth the effort. Personally I prefer DocBook XML and use Norm Walsh's XSLT stylesheets to transform the XML to anything I want... HTML, PDF, whatever.
Here are some resources for your reading pleasure.
DocBook is Open Source, freely available on all platforms of interest, can be used for simple documents to complex books, separates presentation from content, and is extensible. What more could you want from a document format?
One word... (Score:2)
Re:Reverse engineer the thing (Score:2)
Re:It may seem incredibly redundant... (Score:2)
I've actually been through this process at work - we shifted from using a proprietary file format for our invoices to using an XML representation which can then be used to generate a range of views, from HTML for viewing on the intranet to LaTeX for printing and sending to customers. It's a great solution, and it means that we're not tied down to using LaTeX - at a later stage we can change to another document format, and all that needs to be changed is the XSL document for all of our invoices to be available in the new format. If the documents were originally stored in LaTeX format, we would not be able to do this - a change in the output format would require all the invoices to be re-entered (as was the case when we switched *from* the first proprietary format) or a large amount of custom code to be written.
It may seem incredibly redundant... (Score:4)
You can even embed binary data in an XML document (with a tiny bit of creativity) for all those people who like to populate their files with custom fonts, clipart, graphs, etc. (This is accomplished through something, say... <BINARY CLIPART><DATA>[image data]</DATA></BINARY CLIPART>. You get the idea.)
How about special configuration parameters? You could incorporate tags that would handle the way a document is viewed by different people ("are you a techie, marketing drone, webbie, etc" -> certain data becomes visible).
The biggest advantages here are obviously the standards provided by XML (thank you W3C). It's uses are broad. It's got high quality interpreters on ALL platforms (especially JAXP for Java - it's a joy to work with *g*).
The only standards we'd really have to focus on would be which tags would be considered "key" tags.
What else do you need? Doesn't OpenOffice already use XML as it's standard document type?
Sure I could be wrong on this, so don't berate me too much. I've just had a lot of positive experience working with XML for sooo many different applications.
Re:Dangerous and Offensive??? What is standard? (Score:2)
Often I try to convince the English department to teach their students to use something more compatible, like HTML or RTF. There is little demand for images and tables, and when there is it is really part of a Powerpoint sedative or spreadsheet. But they always say, we have to use Word because it is the standard. Meaning that it is the universally compatible format that everyone can read. Now am I just crazy here? Don't answer that. If Word were in fact any kind of standard, why do we have the Tower of Babel with all these incompatible Word documents? Word may be the standard word processor, but there is no standard Word format. There are a dozen different Word formats.
Everyone might as well use whatever weird word processor they like, and pass along a second copy in plain text every time they try to move the document to antother machine. The net effect would be the same.
Try Paper (Score:2)
It's all about PDF (Score:2)
Re:XML, DTD & XSL can be edited with some neat too (Score:2)
XML might be great for text but keeping binary data in it isn't the best of plans. Size being the other issue, but if always saved / restored with gzip or something...
TeX is what you want. (Score:3)
TeX (and the LaTeX frontend) runs on about as many platforms as linux.
the output of a tech document is quite frankly spectacular. you just dont get this kind of quality with the word processing programs that are out there today.
many people thing the learning curve high, but this isn't necessarly so. my advisor says that LaTeX has a one paper activation energy. ie it takes you about one document to learn most of what you need to know to get things done... and once you use it you will find it hard to use anything else in the future.
use LaTeX? want an online reference manager that
Re:SGML/XML/DocBook (Score:2)
Another positive benefit of using SGML: All Department of Defense (IIRC) documentation must be SGML. So if you're ever going to have to maintain government documents, SGML is a great choice!
Matt Barnson
Re:Reverse engineer the thing (Score:2)