HTML Tags For Academic Printing? 338
meketrefi writes "It's been quite a while since I got interested in the idea of using html (instead of .doc. or .odf) as a standard for saving documents — including the more official ones like academic papers. The problem is using HTML to create pages with a stable size that would deal with bibliographical references, page breaks, different printers, etc. Does anyone think it is possible to develop a decent tag like 'div,' but called 'page,' specially for this? Something that would make no use of CSS? Maybe something with attributes as follows: {page size="A4" borders="2.5cm,2.5cm,2cm,2cm" page_numbering="bottomleft,startfrom0"} — You get the idea... { /page} I guess you would not be able to tell when the page would be full, so the browser would have to be in charge of breaking the content into multiple pages when needed. Bibliographical references would probably need a special tag as well, positioned inside the tag ..." Is this such a crazy idea? What would you advise?
Re:LaTeX (Score:5, Insightful)
*sigh* it's a slow news day
Congratulations! (Score:5, Insightful)
Congratulations, you're the 5,134,978th person to suggest a change to HTML which will prevent it from being reflowable!
Please step up to the spiked door in front of the acid pit to claim your prize.
ODF (Score:3, Insightful)
LaTeX already got mentioned, and probably makes more sense.
If you really want an unreadable super-general XML-based format, use ODF.
Wrong, in many ways (Score:5, Insightful)
What you want (being able to define pages) is wrong in many many ways.
You should, as an authoring tool, never define a page, or its dimensions, especially academic works, which will be printed in different formats, on different paper (A4/Letter/Tradeback/etc/etc)
At most, whatever markup you have, many define things like page breaks, but even then, they are more a typesetting issue.
What you want is either LaTeX or DocBook.
hey, why don't we... (Score:2, Insightful)
Re:Static Page Feeds are available (Score:1, Insightful)
1995 called, they want their old HTML back.
Seriously, no browser has needed the HTML comment stuff inside of style tags in many years. And don't even get me started on the uppercase tag names...
You don't actually want HTML (Score:2, Insightful)
Seriously. It's pretty bad. You can, however, use Docbook (or your own schema or Docbook extended with your own stuff) and XSLT it into XTHML (or something entirely different) at the end.
Most likely you just want to use Latex though.
In my day (Score:3, Insightful)
I used netscape communicator to write all my papers for uni, mainly because it was available under windows and unix (IRIX in our case) and could be read by anyone on any platform.
It was a reasonably easy to use editor, without all the useless crap most others have.
A few lecturers were quite impressed with the idea, the portability and cost were big factors.
Re:Don't use HTML (Score:4, Insightful)
HTML was never supposed to do those things in the first place. The tags you are referring were hacks invented because CSS did not exist yet.
Unfortunately there is a whole generation of "web developers" who don't understand the concepts of semantic markup and output device-independent layouts.
Re:wondering if we should let go of standard tags (Score:3, Insightful)
Re:LaTeX (Score:5, Insightful)
Although I agree with you in that LaTeX is widely used in the scientific community, and unambiguously offers the best typesetting facilities you'll find outside of a publishing house, is it still appropriate today?
The internet as we know it was created at CERN to facilitate the sharing of scientific information. Why are we still publishing in a format designed to be presented on dead trees?
Like it or not, a properly-formatted print article looks horrible on a screen. An article formatted for printing on A4 or Letter-sized paper will use the whole width of the page, be set in 10-point type, and use columns. Unfortunately, modern computer screens don't have nearly enough resolution to display the full width of the page alongside much else. Obviously, PDF files also don't have the ability to flow to fit the width of the screen.
LaTeX also doesn't give you the benefit of hypertext. Yes, there are various hacks you can use to add anchors and links to PDFs, although these are mere hacks on top of a broken format. Things such as high-resolution figures and hyperlinked references would be particularly beneficial for academic uses. It'd also be great to be able to see all articles linking back to what you happen to be reading. (This brings up all sorts of questions about the very nature of scientific publishing, although this is another debate entirely)
Wikipedia (more specifically, MediaWiki) actually offers a promising solution to these (and the original poster's) requirements. It provides a convenient and simplistic markup for multi-sectioned articles, flows to fit the width of the page, and also provides LaTeX's fantastic mathematical typesetting facilities. Hyperlinking to other parts of the Wiki (and to external sites) is excessively easy. I'm sure the DOI [doi.org] system could be integrated to allow linking back to other articles within the constraints of the existing academic publishing regime.
Google could very easily provide the "glue" to hold such a system together, although it would ultimately be better to put a public, non-profit entity in charge. It's absurd and hypocritical that so much of academic research (particularly the publishing part of it) is profit-driven.
Re:LaTeX (Score:4, Insightful)
LaTeX describes the document, just like HTML describes it, but with more structure about it. What you see in a web browser is not HTML, it is a *rendering* of HTML. Different browsers render the same HTML differently, for example a mobile browser will skip stuff and reformat other stuff. In the same way there are different LaTeX renderers - some output PDF, some output HTML, some output ODF. It would be much easier to use LaTeX for the source document and then compile a pretty HTML file and a PDF file from it than to craft it in HTML or some other XML variant directly. You can have links in LaTeX that render as normal links in the HTML output. Why stress?
Re:LaTeX (Score:4, Insightful)
Re:LaTeX (Score:3, Insightful)
Unfortunately any program able to handle everyone's different styles for document printing is probably going to be too specialized for everyone to have. LaTeX shows that print layouts are a difficult problem. Even on webpages (screen display), to get really good layouts we rely on scripts, styles and templates from other sources, in most cases these are too numerous to make distribution of the document via e-mail trivial. Plus, we use specialized software (e.g. Dreamweaver).
Unfortunately there's no good solution that I know of for this. Simply throwing text and images into a document does not make it readable, and there's no software that can simply take the jumble and make it readable, it takes a human touch to produce a good layout.
Re:wondering if we should let go of standard tags (Score:4, Insightful)
There's no complexity problem that cannot be solved by adding a layer of abstraction, nor performance problem that cannot be solved by removing a layer of abstraction.
Though I must note that you can already define your own tags in HTML+CSS and, while the W3 validator will (rightfully) complain loudly about them, most browsers deal with them just fine.
Re:LaTeX (Score:2, Insightful)
Latex is really the solution. There is no reason to reinvent the wheel. In fact, reinventing the wheel might cause problems when submitting papers. From what I have seen, many academic journals prefer .tex and .eps files. I can't imagine what they would do with HTML.
Actually, that may be true of some academic journals, but most deal primarily with MS Word documents. Some publishers might grudgingly deal with Latex documents (I know John Benjamin's mentions it in their style requirements), but the people who run conferences and therefore are in charge of submitting the proceedings tend not to be computer saavy enough to work with anything other than MS Word files (god save whoever has to deal with the millions of random fonts people use, use/non-use of styles, etc).
This of course depends on your field - in Comp Sci, I'd wager there're many more journals that regularly accept latex files. In linguistics, it's somewhat rarer, and as you get further into the humanities, it becomes increasingly difficult to find anyone who's heard of Latex at all.
Let CSS work for you! (Score:3, Insightful)
<head>
<title>Abstract of a usable design</title>
<style type="text/css">
@media print {
body { margin: 2.5cm; }
}
@media screen {
body { margin: 50px; width: 50%; }
}
body { font-family: sans-serif; font-size: 12pt; }
</style>
</head>
<body>
<h1>It's so crazy it just might work</h2>
<h2>and other html inspired musings</h2>
<p>Why not just use css?</p>
<p>Also, don't worry about page numbering. that's the browser's job.</p>
</body>
</html>
I am curious to know... (Score:3, Insightful)
CSS has a method of creating pages, for printing and more. It's no more difficult to learn than HTML is. You could use XML, create all the custom tags you want, and use XSL (oh look stylesheets again) to style the XML however you want.
HTML5 is coming out in the near or distant future, if you have suggestions for tags and functions, you might want to try to get involved with the W3.
Re:LaTeX (Score:1, Insightful)
Sometimes on lab equipment you don't have X or similar. I run latex and put it to print. I don't need to see it before the printing. I don't get distracted by layout while doing the relevant stuff and if you don't get a weird distro mod of latex it is really protable. HTML is for things where you don't know the output size. Latex is for generating documents.
Re:Static Page Feeds are available (Score:3, Insightful)
The HTML comments inside of style tags are still a good idea. Although no modern browser requires it, not everything that parses HTML is a full-blown web browser. Those extra seven bytes don't hurt anything, and they pretty much guarantee that any code with anything resembling a proper HTML parser won't interpret the styles, JavaScript, etc. as content even if the tool doesn't understand or care about specific tags.
Perhaps more importantly, from a purely philosophical point of view, leaving out the comments in style tags is wrong. That line noise is not part of the content, and therefore should be fundamentally separated from the presentation. Other stuff like that (link URLs, image URLs, inline styles, etc.) are all in HTML attributes or otherwise sequestered from the text content. Putting CSS or JavaScript bare inside a tag without surrounding it with comment markers violates the fundamental philosophy of HTML. Yes, this means the XHTML spec is fundamentally defective by design.
I'll leave the uppercase/lowercase flame war to people who care.
Re:LaTeX (Score:3, Insightful)
99.99% of LaTeX is output straight to PDF, Postscript, or (in special cases such as Wikipedia's math renderer) a rasterized image. The documentation, plugins, and user community of LaTeX all reflect this.
I haven't come across any serious usage of LaTeX in the manner that you describe it.
ODF and HTML do not support the full set of typographic features that LaTeX does. Something will almost certainly be lost in translation.
Although I suppose it's possible to craft a source document that would look good both in print and as free-flowing hypertext, you'd need a zen-like command of the language. LaTeX has enough quirks as it is. I have a very difficult time accepting this as a practical solution.
Also, if you're crafting a hypertext document, why not start with a language specifically designed for the task?
Re:LaTeX (Score:5, Insightful)
Portability:
Name me an OS that doesn't have a PDF or PS reader installed by default.
Windows
How portableis HTML? (Score:3, Insightful)
These different browsers render HTMLdifferently.
Re:LaTeX (Score:2, Insightful)
Please also remember that the "compiled" output of C is PDP-11 machine code, not x86 code or CLR code.
Re:LaTeX (Score:3, Insightful)
I edit and layout books, and the worst problems are when the author DOES think he has typographic literacy. If I let them have their way they print books set in Arial 10 point with vertical quotemarks, and I could go on.
It took me months of study and years of practice to get the degree of typographic literacy I have now.
They will just waste their time on details that will actually impede the publishing process if not stripped out.
It's like cutting your own hair -- yes, you can do it. But you're less likely to make a fool of yourself if you pay someone who does it all day long a couple of dollars to do it right.
Authors ideally should not be concerned with visual layout. They just need to make sure that the logical structure (headings, notes, location of diagrams) is clear. Doesn't matter if they use Courier or Comic sans.