Pretty Printing From An XML File? 65
Omega1045 writes "Where I work we are developing a new product that receives an XML document (on a W2k workstation), and we need to format and print said document. We are currently using XSLT + CSS to build a cool little HTML page out the the XML, then use a browser to print out the HTML. However, while HTML is a nice format for display, it is not a nice format for printing. We have messed around with the idea of spitting out Rich Text with XSLT. However, Rich Text is confusing and quite frankly sucks. We are looking for a (free if possible) format that we can translate our XML document into via XSLT, and print. The best idea we have at this point is to translate into a Word or OpenOffice XML schema document, and use one of those applications to print. Other ideas?"
Postscript? (Score:4, Interesting)
FOP (Score:5, Informative)
Very powerful if you ask me. I used it on a project back in 2000-2001 and was pleased with how it turned out at the time. I'm sure the current product is much, much, better than it was back then.
Re:FOP (Score:3, Informative)
I agree. XML-FO is supposed to be the W3C's "definitive" answer to an XML, page-oriented formatting language. XML-FO achieves basically exactly what PDF does, but in XML. It's supposed to be a structured representation of print-formatted data.
I've used it quite a lot and it's great. It's very easy to transform your XML data into the XML-FO schema, from which you can use Apache-FOP (or one of the commercial tools) to turn it into a format other tools can use (PDF, ps, RTF, etc).
Re:FOP (Score:3, Informative)
BTW, from http://www.w3schools.com/xslfo/xslfo_intro.asp [w3schools.com]:
XSL-FO is Formally Named XSL Why this confusion? Is XSL-FO and XSL the same thing? Yes it is, but we will give you an explanation: Styling is both about transforming and formatting information. When the World Wide Web Consortium (W3C) made their first XSL Working Draft, it contained the language syntax for both transforming and formatting XML documents. Later the XSL Working Group at W3C split the o
Re:FOP (Score:2)
And for history buffs, this all came from James Clark's DSSSL (both a transformation system and a formatting language) for SGML.
Re:FOP (Score:2)
Re:FOP (Score:2, Interesting)
I was using FOP to create 'slides', and it did an ok job. Nice that it supports links in the PDF file.
I also looked at ReportLab for Python, which seemed slightly better to me than FOP, with one exception. I think the link support was not as nice as Ap
Re:FOP (Score:1)
- Ulysses Everett McGill
-Peter
Try PDF (Score:5, Informative)
I have a similiar problem I solve through the use of XSLT and XSL-FO. Use XSLT to transform the XML into XSL-FO. Then, use Apache FOP [apache.org] to render the XSL-FO into PDF.
Another variation is to transform your XML into an HTML subset, then use a standard XSLT to transform the HTML into XSL-FO. A similiar technique is used by Aurigadoc [sf.net] to create all sorts of output formats using an XML source.
Re:Try PDF (Score:3, Informative)
Apache FOP Supports Postscript (Score:3, Informative)
XSLT-FO (Score:5, Informative)
http://www.cranesoftw
There are also paywhere implimentations XSLT-FO this. Basicaly it is the extension to XSLT for print.
Mod parent up! (Score:2)
Everyone seems to know XSLT, but not it's sister XSLFO spec. XSL-FO is designed for exactly this!
No T in XSL-FO (Score:1)
Consider YesLogic Prince (Score:4, Informative)
LaTeX (Score:5, Insightful)
Re:LaTeX (Score:2)
However, Latex's output looks better than FOP's. Only just slightly.
Re:LaTeX (Score:1)
PCL, depending on how complex your layout is. (Score:2, Interesting)
Re:PCL, depending on how complex your layout is. (Score:2)
FWIW, I've found PDF to be pretty accurate/flexible/comprehensive in most areas.
Try Docbook (Score:3, Interesting)
I'm not sure how the docbook LaTeX filters work, but you may want to avoid LaTeX, for several reasons: special characters. LaTeX doesn't do Unicode, you'll have to translate those characters. That's not a huge problem, merely an annoyance.
But quotes can be annoying. Latex wants directional quotes. This is fine only if you have full control of your source and are willing to deal with it.
I tried to go direct to latex on one of my projects, it's not straightforward. Unless I'm missing something obvious; if someone does know a solution, please inform me 8-}
Re:Try Docbook (Score:2)
Latex is doable but it is too much fuss in the end.
Comment removed (Score:4, Informative)
Docbook (Score:4, Informative)
You could also hack one of the docbook XSL stylesheets (using XSLT? would be pretty!) to make it parse your own format.
Feel ready to own one or many Tux Stickers [ptaff.ca]?
SVG? (Score:2, Interesting)
Re:SVG? (Score:2, Interesting)
XMLPDF (Score:4, Informative)
http://www.xmlpdf.com/
Cheers,
Dave
maybe...? (Score:1, Informative)
Re: (Score:1)
You're almost there... (Score:5, Insightful)
Take the XML and the XSL and transform it into 100% valid XHTML. HTML 4 is deprecated, the standard will not be updated. XHTML 1.0 is 5 years old already - start to use it.
Use CSS - pay attention to
@media screen,print
{
}
@media screen
{
}
@media print
{
}
If it doesn't print well, you probably need to refresh your CSS here: http://www.w3.org/style
Goodluck.
Re:You're almost there... (Score:2)
So? lots of things have specifications... In fact as I read it, it says:
then
Re:You're almost there... (Score:2)
It is a recommendation, not a standard. There is a difference.
It is not deprecated.
And, to the best of my knowledge, no-one has ever formally stated that it will not be updated.
There may be reasons to use XML and CSS in the application. The original poster was 100% incorrect about HTML 4, though.
Re:You're almost there... (Score:1, Informative)
The W3C seems to disagree with you.
No they don't, as other people have pointed out, you have thoroughly confused the terms "specification" and "standard". If you want to hear it straight from the horse's mouth [w3.org]:
The W3C has been an active voice in industry technology debates for a little over two years now. Today, we represent over one hundred and seventy developers, research organizations, government agencies, and users. We have a technical staff of three dozen folks around the world working in three
Re:You're almost there... (Score:2)
While W3C [w3.org] has not made HTML a standard, the ISO [iso.org] and IEC [iec.org] apparently have standardised "a refinement of the World Wide Web Consortium's (W3C's) Recommendation for HTML 4.0
Re:You're almost there... (Score:2)
Re:You're almost there... (Score:3, Interesting)
As the poster said, they've tried HTML, and didn't like it. I very much doubt that the print quality of XHTML would be any better than HTML. (I don't quite understand either why you're including screen styles for a page that is intended only for printing.)
As for HTML 4 being a dead end, the WHAT WG [whatwg.org], a collaboration among developers from most browsers, are defining a set of specificati
HTML is good enough (Score:2)
Re:HTML is good enough (Score:2)
> placing of text and graphics
No, goddammit, it's not. It's designed for structurally marking up hypertext, thus the name. The visual presentation is being dealt with by CSS. The separation of style and content through HTML4/CSS1 is best practice since when, 1998?
And while we're at it, how do you e.g. position an image with say 4mm full run-around at (75mm/150mm) on a printout with plain HTML? To make it short, you don't because you can't.
Mind you, I d
Re:HTML is good enough (Score:2, Funny)
Re:HTML is good enough (Score:1)
Re:HTML is good enough (Score:1)
Re:HTML is good enough (Score:1)
Actually the friends list can serve as a nice non-idiot database, and I regularly included people with whom I disagreed but who obviously know what they're talking about. But now I am reshaping it to represent a social network, and the comments I get to see degrade in quality as I remove more and more entries from the NIDB times. Slashdot definitely should offer a finer granulation of comment moderation. If
Re:HTML is good enough (Score:1)
Yes, I still do that. Nice to see that other people also do/did it.
Aslo, I give freaks more negative points than I do foes.
Re:HTML is good enough (Score:2)
Slashcode (Score:1)
Re:Slashcode (Score:1)
Listen to me ;-) (Score:4, Interesting)
First, we had a bunch of product data in a MS SQL server db. We had a Java (I think) task that nightly dumped XML file (one per product) based on the DB.
Then, we applied an XSLT transformation to each XML to produce the static HTML page for that day (static both to reduce server load and optimize google's searching of it, since Google didn't/doesn't like dynamic content)
Then we wanted to produce a printer catalogue, so rather than printing pages, I made an XSLT that transformed the XML not into HTML but into FOP. FOP is some Java shit from Apache that takes FOP files and spits out a PDF.
Obviously I don't remember details, but it worked.
I had the idea to generate the PDFs not just for the printed catalogues but also as "printable version" for each HTML page. So both PDFs and HTMLs were generated nightly. Yeah it took a while but it was cool.
It also served to improve our pagerank because (1) the PDFs made it look like we've got twice as much content and because (2) google gave higher weightings to PDFs (at the time, anyway)
And, it was easy.
ASCII (Score:3, Insightful)
A project the does what you mention... (Score:2, Redundant)
One of their XSLT transforms the XML to a PDF file, maybe that would be a good place to look. I only found this today, and since I'm only just learning XML, I don't know how well this applies... Enjoy!
Re:A project the does what you mention... (Score:1)
Me Too (Score:3, Insightful)
When I started with my current employer, we had a very complicated PDFing process. Every night a transfer workstation would copy datafiles localy from a backup of the production server. A pervasive driver was loaded to read the dat files. Access would import the data from pervasive and run a report that was saved as a RTF file. It was then opened in Word where a macro would then PDF the document and close. The PDF was then copied to the webserver for the users to download.
What a mess and a nightmare to debug. It would work for a few months and then at seamingly random times, it would crash horibly for several days in a row.
When it did break, i felt like I wasted a lot of time tracking down ghost problems. In my slow days I rewrote it.
It now pulls read only data from the production server with that pervasive driver into a xml file. Then apply a xsl transform and pass the result to the FOP processor and place the result directly on the webserver.
A process that took an hour to run now finishes in 2 minutes. It is quick enough, we run it every 20 min. FOP was quick to setup and the examples are like a blue print and easy to figure out.
I have never had a problem with the new implementation and the end user had no impact and was unaware of the change.
I would recomend using a FOP processor to my friends.
XML -> RTF via XSL... (Score:2, Interesting)
... Why are you asking slashdot? (Score:3, Funny)
Seriously, buddy. It's not hard. XML -> Perl/Php/Python/Whatever P you chose -> PostScript -> pretty printing
Google is your friend. [justfuckinggoogleit.com]
DocBook-XSL + XSL-FO + FOP (Score:2, Insightful)
Alternatively, skip the DocBook step and transform straight to XSL-FO.
Handling XML files...and other stuff (Score:2)
I ran across this a little while ago.... DeliveryWare [esker.com]
It will handle XML documents and convert to various formats and can fax, e-mail, print or do whatever with the file.
gvim will do it (Score:2)
Haven't tried it, but should be a breeze. And a portable solution too.