Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Software

Pretty Printing From An XML File? 65

Omega1045 writes "Where I work we are developing a new product that receives an XML document (on a W2k workstation), and we need to format and print said document. We are currently using XSLT + CSS to build a cool little HTML page out the the XML, then use a browser to print out the HTML. However, while HTML is a nice format for display, it is not a nice format for printing. We have messed around with the idea of spitting out Rich Text with XSLT. However, Rich Text is confusing and quite frankly sucks. We are looking for a (free if possible) format that we can translate our XML document into via XSLT, and print. The best idea we have at this point is to translate into a Word or OpenOffice XML schema document, and use one of those applications to print. Other ideas?"
This discussion has been archived. No new comments can be posted.

Pretty Printing From An XML File?

Comments Filter:
  • Postscript? (Score:4, Interesting)

    by Hanji ( 626246 ) on Thursday October 14, 2004 @06:48PM (#10529901)
    I'm not actually familiar with the details of postcript at all, but it certainly seems a logical format to consider if printing things is your concern.
  • FOP (Score:5, Informative)

    by pi_rules ( 123171 ) * on Thursday October 14, 2004 @06:50PM (#10529917)
    Apache FOP Homepage [apache.org]

    Very powerful if you ask me. I used it on a project back in 2000-2001 and was pleased with how it turned out at the time. I'm sure the current product is much, much, better than it was back then.
    • Re:FOP (Score:3, Informative)

      by danpat ( 119101 )

      I agree. XML-FO is supposed to be the W3C's "definitive" answer to an XML, page-oriented formatting language. XML-FO achieves basically exactly what PDF does, but in XML. It's supposed to be a structured representation of print-formatted data.

      I've used it quite a lot and it's great. It's very easy to transform your XML data into the XML-FO schema, from which you can use Apache-FOP (or one of the commercial tools) to turn it into a format other tools can use (PDF, ps, RTF, etc).

      • Re:FOP (Score:3, Informative)

        by malachid69 ( 306291 )
        I agree. XML->XSL->FOP is the way to go.

        BTW, from http://www.w3schools.com/xslfo/xslfo_intro.asp [w3schools.com]:

        XSL-FO is Formally Named XSL Why this confusion? Is XSL-FO and XSL the same thing? Yes it is, but we will give you an explanation: Styling is both about transforming and formatting information. When the World Wide Web Consortium (W3C) made their first XSL Working Draft, it contained the language syntax for both transforming and formatting XML documents. Later the XSL Working Group at W3C split the o

        • And for history buffs, this all came from James Clark's DSSSL (both a transformation system and a formatting language) for SGML.

    • If you intend to print and you are not using XML-FO you're doing something wrong.
    • Re:FOP (Score:2, Interesting)

      by pmuellr ( 213665 )
      One problem with Apache's FOP is that it doesn't support keep, orphan, widow type stuff, so it's difficult to get nice looking paginated stuff, broken at natural places. FOP supports it in the 'spec', Apache FOP doesn't support it in the implementation.

      I was using FOP to create 'slides', and it did an ok job. Nice that it supports links in the PDF file.

      I also looked at ReportLab for Python, which seemed slightly better to me than FOP, with one exception. I think the link support was not as nice as Ap

    • "Well, I don't want FOP, goddamn it! I'm a Dapper Dan man!"
      - Ulysses Everett McGill

      -Peter
  • Try PDF (Score:5, Informative)

    by Mastos ( 448544 ) on Thursday October 14, 2004 @06:50PM (#10529921)

    I have a similiar problem I solve through the use of XSLT and XSL-FO. Use XSLT to transform the XML into XSL-FO. Then, use Apache FOP [apache.org] to render the XSL-FO into PDF.

    Another variation is to transform your XML into an HTML subset, then use a standard XSLT to transform the HTML into XSL-FO. A similiar technique is used by Aurigadoc [sf.net] to create all sorts of output formats using an XML source.

    • Re:Try PDF (Score:3, Informative)

      by Apreche ( 239272 )
      That's a good idea. You can also translate into postscript, then send the postscript right to the printer. You can also use ps2pdf to make pdfs from the postscript. People don't realize, but postscript is actually a programming language you can write in. Turning an XML document into a postscript should be not outside the real of possibility with XSL.
  • XSLT-FO (Score:5, Informative)

    by JumpSuit Boy ( 29166 ) on Thursday October 14, 2004 @06:54PM (#10529958) Homepage
    http://xml.apache.org/fop/
    http://www.cranesoftwr ights.com/training/ has a book about how do this that was created using XSLT-FO

    There are also paywhere implimentations XSLT-FO this. Basicaly it is the extension to XSLT for print.
  • by PornMaster ( 749461 ) on Thursday October 14, 2004 @06:54PM (#10529962) Homepage
    Prince [yeslogic.com] is a batch formatter for converting XML into PDF and PostScript by applying Cascading Style Sheets (CSS). Unlike other formatters, Prince prints any XML vocabulary without relying on proprietary markup
  • LaTeX (Score:5, Insightful)

    by Asgard ( 60200 ) <jhmartin-s-5f7bbb@toger.us> on Thursday October 14, 2004 @06:59PM (#10530012) Homepage
    Generate a LaTeX [latex-project.org] document file, compile it using PDFLatex [tug.org] and print. Or, use normal LaTeX and print directly from it, depending if .dvi files offend you.
    • Been there done that. All the horrors with metacharacters and difficulties with page layout made me switch to FO

      However, Latex's output looks better than FOP's. Only just slightly.
  • The company I work for dynamically fills out complicated forms and fills in their data. We use PDF, sure, but if you've got any complicated stuff where you need things to be very exact, or need to support things like mixed pages sizes, etc. You want to look into Printer Control Language, originally created by HP and supported on most printers.
    • We use PDF, sure, but if you've got any complicated stuff where you need things to be very exact, or need to support things like mixed pages sizes, ...
      Care to share any examples of situations where you've opted for PCL to work around PDF shortcomings?

      FWIW, I've found PDF to be pretty accurate/flexible/comprehensive in most areas.

  • Try Docbook (Score:3, Interesting)

    by Pyromage ( 19360 ) on Thursday October 14, 2004 @07:09PM (#10530115) Homepage
    Docbook is an XML based document format, with support to output to many different formats, including HTML and LaTeX, as I recall.

    I'm not sure how the docbook LaTeX filters work, but you may want to avoid LaTeX, for several reasons: special characters. LaTeX doesn't do Unicode, you'll have to translate those characters. That's not a huge problem, merely an annoyance.

    But quotes can be annoying. Latex wants directional quotes. This is fine only if you have full control of your source and are willing to deal with it.

    I tried to go direct to latex on one of my projects, it's not straightforward. Unless I'm missing something obvious; if someone does know a solution, please inform me 8-}
  • Docbook (Score:4, Informative)

    by ptaff ( 165113 ) on Thursday October 14, 2004 @07:13PM (#10530142) Homepage
    Another XML-based format is DocBook [docbook.org], which originally was SGML based but now has a XML [docbook.org] DTD too. From this format you can output to ps, pdf, rtf and plenty of other formats.

    You could also hack one of the docbook XSL stylesheets (using XSLT? would be pretty!) to make it parse your own format.

    Feel ready to own one or many Tux Stickers [ptaff.ca]?
  • SVG? (Score:2, Interesting)

    If I understand what you're asking, SVG would be a good choice. Bullet sharp text that prints excellently. It can be automatically generated and is based off of xml so it shouldn't be too hard to intergrate.
    • Re:SVG? (Score:2, Interesting)

      by hsoft ( 742011 )
      Yup. I'd personally say "try harder with {X}HTML", but in case it fails utterly, SVG will definately be the way to go! It will be much harder for you though to transform XML into printable SVG than into XHTML.
  • XMLPDF (Score:4, Informative)

    by WasterDave ( 20047 ) <davep@z e d k e p.com> on Thursday October 14, 2004 @07:32PM (#10530275)
    Never quite sure what the hell it does myself, but a few people here swear by it:

    http://www.xmlpdf.com/

    Cheers,
    Dave

  • maybe...? (Score:1, Informative)

    send it to a MySQL database, then Use PHP to creat a PDF, and print it from PDF... although I can't contest for the quality of printing from PDF. Just my 2cents.
  • by BladeMelbourne ( 518866 ) on Thursday October 14, 2004 @07:35PM (#10530297)
    Having been in the same situation before, this is what I suggest...

    Take the XML and the XSL and transform it into 100% valid XHTML. HTML 4 is deprecated, the standard will not be updated. XHTML 1.0 is 5 years old already - start to use it.

    Use CSS - pay attention to
    @media screen,print
    { /*Styles for browser and printer*/
    }
    @media screen
    { /*Styles for browser only*/
    }
    @media print
    { /*Styles for printer only*/
    }

    If it doesn't print well, you probably need to refresh your CSS here: http://www.w3.org/style

    Goodluck.
    • Take the XML and the XSL and transform it into 100% valid XHTML. HTML 4 is deprecated, the standard will not be updated.

      As the poster said, they've tried HTML, and didn't like it. I very much doubt that the print quality of XHTML would be any better than HTML. (I don't quite understand either why you're including screen styles for a page that is intended only for printing.)

      As for HTML 4 being a dead end, the WHAT WG [whatwg.org], a collaboration among developers from most browsers, are defining a set of specificati

  • I'm not sure why HTML isnt sufficient for you. We build an app that had to print boxlabels, HTML fit the bill nicely. Its designed for visual presentation of data, placing of text and graphics, and thats whats happening during printing or viewing for proofing. Ours is a bad example because we didnt have XML-source data but I know HTML would work fine.
    • > Its designed for visual presentation of data,
      > placing of text and graphics

      No, goddammit, it's not. It's designed for structurally marking up hypertext, thus the name. The visual presentation is being dealt with by CSS. The separation of style and content through HTML4/CSS1 is best practice since when, 1998?

      And while we're at it, how do you e.g. position an image with say 4mm full run-around at (75mm/150mm) on a printout with plain HTML? To make it short, you don't because you can't.

      Mind you, I d
      • Welcome to my friends list.
        • I know you're being modded as funny (obviously by someone who didn't check to see that you indeed added me to your friends list), but if my comment puts me onto your list, I guess you belong onto mine, too, as I tend to befriend people with similar views on some topics.
          • I tend to do that too. It makes slashdot biased though. Biased toward my side, but still...
            • True, true. But any medium is biased, and if it is only because you filter the content in your mind.

              Actually the friends list can serve as a nice non-idiot database, and I regularly included people with whom I disagreed but who obviously know what they're talking about. But now I am reshaping it to represent a social network, and the comments I get to see degrade in quality as I remove more and more entries from the NIDB times. Slashdot definitely should offer a finer granulation of comment moderation. If

              • Yes, I still do that. Nice to see that other people also do/did it.
                Aslo, I give freaks more negative points than I do foes.
                • Actually I give freaks a +2 modifier so I can see and pick on them if they say something stupid. *g* It's all just on personal preference, I guess, but to me it feels as if all those features in slashcode aren't exactly made in a way that encourage creative use, but that might be because of server load issues. I don't know perl, so I can't look it up. ;)
                  • Some parts of the slashcode [slashcode.com] are pretty easy to understand. For example, I patched it so that Underrated and Overrated would get caught in moderation. (But that's a feature that they want to keep. Despite new users filing it as a bug every three months.)
                    • I actually doubt that they would check in patches I make to be used on /. itself. :) But then again, I don't think they are too interested in community building, otherwise they'd have a meta section and better community features like distinguishing the friends from the mere non-idiots and fine-tuning moderation. -1 to +5 just isn't enough. It would be *so* easy to hide the scores and let users choose (I want the n highest rated comments) while maintaining a moderation limit per comment to prevent over-moder
  • Listen to me ;-) (Score:4, Interesting)

    by cookiepus ( 154655 ) on Thursday October 14, 2004 @08:45PM (#10530807) Homepage
    I've had do to just this, actually... here's the setup. Don't ask me why certain things were the way they were, certainly you can improve. I inherited some of this. But it worked...

    First, we had a bunch of product data in a MS SQL server db. We had a Java (I think) task that nightly dumped XML file (one per product) based on the DB.

    Then, we applied an XSLT transformation to each XML to produce the static HTML page for that day (static both to reduce server load and optimize google's searching of it, since Google didn't/doesn't like dynamic content)

    Then we wanted to produce a printer catalogue, so rather than printing pages, I made an XSLT that transformed the XML not into HTML but into FOP. FOP is some Java shit from Apache that takes FOP files and spits out a PDF.

    Obviously I don't remember details, but it worked.

    I had the idea to generate the PDFs not just for the printed catalogues but also as "printable version" for each HTML page. So both PDFs and HTMLs were generated nightly. Yeah it took a while but it was cool.

    It also served to improve our pagerank because (1) the PDFs made it look like we've got twice as much content and because (2) google gave higher weightings to PDFs (at the time, anyway)

    And, it was easy.
  • ASCII (Score:3, Insightful)

    by Rie Beam ( 632299 ) on Thursday October 14, 2004 @09:05PM (#10530917) Journal
    You'd be surprised what a little coloring and some ASCII artwork can do.
  • I just happened to be updating my resume again, decided to make it XML based and found the xml resume library on sourceforge (xmlresume.sourceforge.net)

    One of their XSLT transforms the XML to a PDF file, maybe that would be a good place to look. I only found this today, and since I'm only just learning XML, I don't know how well this applies... Enjoy!
  • Me Too (Score:3, Insightful)

    by KevMar ( 471257 ) on Thursday October 14, 2004 @11:08PM (#10531820) Homepage Journal
    We had this problem once, but worse.

    When I started with my current employer, we had a very complicated PDFing process. Every night a transfer workstation would copy datafiles localy from a backup of the production server. A pervasive driver was loaded to read the dat files. Access would import the data from pervasive and run a report that was saved as a RTF file. It was then opened in Word where a macro would then PDF the document and close. The PDF was then copied to the webserver for the users to download.

    What a mess and a nightmare to debug. It would work for a few months and then at seamingly random times, it would crash horibly for several days in a row.

    When it did break, i felt like I wasted a lot of time tracking down ghost problems. In my slow days I rewrote it.

    It now pulls read only data from the production server with that pervasive driver into a xml file. Then apply a xsl transform and pass the result to the FOP processor and place the result directly on the webserver.

    A process that took an hour to run now finishes in 2 minutes. It is quick enough, we run it every 20 min. FOP was quick to setup and the examples are like a blue print and easy to figure out.

    I have never had a problem with the new implementation and the end user had no impact and was unaware of the change.

    I would recomend using a FOP processor to my friends.
  • by timjones ( 78467 )
    I have two production applications at work that convert a subset of HTML to RTF via an rtf.xsl stylesheet, with results good enough, that my users actually call it (RTF) the "print" format, whilst the HTML is (naturally) the "view" format. The subset it supports is TABLE, TBODY/THEAD, TR/TD, FONT COLOR/SIZE, @BGCOLOR attributes, and rudimentary PNG/JPEG image embedding (but only as part of the text stream & in a table cell, not as independently positioned images). I had to add a few things to plain HTM
  • by op00to ( 219949 ) on Friday October 15, 2004 @02:59PM (#10538326)
    Is /. now the offical R&D contractor for every shitty, piss-ant "company" with a "product" out there?

    Seriously, buddy. It's not hard. XML -> Perl/Php/Python/Whatever P you chose -> PostScript -> pretty printing

    Google is your friend. [justfuckinggoogleit.com]
  • Use XSLT to transform your XML to DocBook [docbook.org], then use DocBook XSL [sourceforge.net] to convert to XSL-FO, then Apache FOP [apache.org] to generate a PDF.

    Alternatively, skip the DocBook step and transform straight to XSL-FO.
  • I know you are looking for a free solution, but in the future if you need to do some other heaving lifting with your documents, check out DeliveryWare from Esker Software.

    I ran across this a little while ago.... DeliveryWare [esker.com]

    It will handle XML documents and convert to various formats and can fax, e-mail, print or do whatever with the file.

  • gvim will do it - and command-line. Just run it with a couple of initial commands (use -c to specify comands). ":syntax on" to turn syntax highlighting on and ":hardcopy" to print.

    Haven't tried it, but should be a breeze. And a portable solution too.

"The four building blocks of the universe are fire, water, gravel and vinyl." -- Dave Barry

Working...