Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×
Data Storage Media Software

Multi-page PDF To Multi-page TIFF and Archiving? 125

GeorgeMonroy writes "One of my clients has aperture cards that they have been scanning into multi-page PDF files — but now they want them in multi-page TIFFs instead. One of the reasons they gave for this is that TIFF files require less storage space. While that is true, I wonder if TIFF is the best format going into the future. Are TIFFs better than PDFs for future use? I wonder what format you think would last longer. Are there any other formats that you think would be better or more future-proof? To me, storage is not a good enough reason to go to TIFF, because storage prices are always dropping anyway. Also, since they already have many of these files in PDF format and they want to convert them into multipage TIFFs, are there any programs that you can recommend that will perform batch processing of files so that we do not have to convert each PDF one by one? If another file format is better than TIFF, then are there any programs for batch processing that you can recommend?"
This discussion has been archived. No new comments can be posted.

Multi-page PDF To Multi-page TIFF and Archiving?

Comments Filter:
  • If they're images, then you should use TIFF (or perhaps PNG). However, it doesn't make sense for them to be "multi-page." If they're documents, then PDF is appropriate.

    I would suggest that your client doesn't know WTF they want.

    • by pclminion ( 145572 ) on Tuesday June 24, 2008 @12:35PM (#23919321)

      If they're images, then you should use TIFF (or perhaps PNG). However, it doesn't make sense for them to be "multi-page." If they're documents, then PDF is appropriate.

      Multi-page TIFF is well supported in the industry. There is nothing "weird" about it. It even supports embedded, searchable text (a Microsoft addition, but something that actually adds value). PDF archival can be difficult to do correctly. At the very least you want to use a product which supports PDF/A, followed up with some serious validation to make sure the results are actually compliant. Otherwise you may get bitten decades down the road. Searchable TIFF, on the other hand, will be around for freaking ever.

      • by MBGMorden ( 803437 ) on Tuesday June 24, 2008 @01:32PM (#23920685)

        Multi-page TIFF is well supported in the industry.
        Better supported than PDF in some cases. Our records management (in addition to keeping electronic scanned copies) still insists on having a microfilm copy of all of our retained documents. We can send digital copies to a processing company to have them processed, but they don't accept PDF documents - only TIFF's (multi-page is acceptable). Given that our internal document management is all in PDF, I ended up having to find a program to convert all of that information about a year ago (though the name of the program we ended up using escapes me - I wouldn't recommend it anyways, since it crashed for me very frequently).
        • by DimmO ( 1179765 )
          I can just imagine the OP's rollercoaster of emotions reading the parent: "ooh awesome someone knows a program to convert pdf to tiff. Exactly what I want! oh wait, he forgets what it's called. dammit. but, it crashed all the time. meh."
        • Couldn't Ghostscript do this? Tiff is an output device option. I don't think it'd take too much to jimmy up a script to process an assload of PDFs. (Unless, of course, that's the program that was crashing frequently...)
      • I assume that he means that it doesn't make sense for an image to be multi-page. In other words, it should be a single page and let the printer driver/program work out the paging. Of course, if they are scanning into PDF in the first place I would assume that they aren't images.

        • Oops, I actually followed the link to find out what an aperture card is and I see that I'm wrong- they are basically engineering drawings so they are images (with text annotations). But that doesn't explain why they are multi-page when a PDF page can be any size you want.

          • by tepples ( 727027 )

            they are basically engineering drawings so they are images (with text annotations). But that doesn't explain why they are multi-page when a PDF page can be any size you want.
            Different pages might cover different views or different parts of a product.
      • Re: (Score:1, Interesting)

        by Anonymous Coward

        ... PDF archival can be difficult to do correctly. At the very least you want to use a product which supports PDF/A, followed up with some serious validation to make sure the results are actually compliant...

        Daily I convert mass amount of PDFs to multi-page or single-page TIFFs, and on a daily basis I come across PDF errors. Unless you do extensive and exhaustive error checking\trapping, I would archive in TIFF format. The TIFF format is considerably less prone to errors within the document. TIFF is widely used and accepted, especially in the United States Court of Law.

    • by cduffy ( 652 )

      Yes, it makes sense for them to be multi-page. TIFF is a multi-page format -- it actually supports a huge number of subformats (and different compression algorithms) in addition.

      • by BadMrMojo ( 767184 ) on Tuesday June 24, 2008 @12:51PM (#23919741)

        ...it actually supports a huge number of subformats (and different compression algorithms) in addition.

        TIFF: Thousands of Image File Formats

        If you do wind up converting to tiff, then remember to document everything in excruciating detail. With thousands of possible combinations - each of which is a perfectly valid tiff image - you may encounter some issues if someone's using a less robust reader and assuming for the wrong compression algorithm, byte order, data striping or photometric interpretation.

      • by mrchaotica ( 681592 ) * on Tuesday June 24, 2008 @03:03PM (#23922485)

        Yes, it makes sense for them to be multi-page. TIFF is a multi-page format...

        Having the format support a thing, and having that thing make sense are two different things. For example, Excel supports being used as a database... but does it make sense?

        My point: use image formats for images, and document formats for documents. If the things you're trying to store are images, don't put them in a document format, and if they're documents, then don't put them in an image format.

        Also, if TIFF is designed to store both images and documents, then I question whether it is too general to do either of them well. And your mention of "subformats" makes me think my concern is well-founded!

        • by cduffy ( 652 )

          Sure, it makes sense. Look at faxes -- they're multiple pages of bitmapped images. Does that compose a "document"? Maybe. Is it something TIFF is good for? Absolutely! There are standardized tags used for storing extra information about fax transmissions in TIFF documents, and there's a great deal of software which makes use of that metadata. The same thing is true of archiving documents composed of scanned images -- there are a great many tag types associated with metadata such as scanner model and configu

          • Ah, so TIFF is really general. In that case, saying you're storing something "in TIFF" doesn't really say much, does it? It's kind of like saying you're storing it "in XML" -- what does that mean, if you don't specify a schema too?

            The distinction between a "image" and a "document" is a hazy one at times, and I don't think we have a good enough description of the problem space the parent is working in to make the kind of judgment you're arguing for. Certainly, whether something is composed of more than one

            • by cduffy ( 652 )

              Is it a sequence of related but not nearly identical images? Then it's probably a slide show, which is a kind of document, and ought to be stored in a document format like PDF or .odp (OpenDocument Presentation).

              If I get a multi-page fax, I don't want it delivered as a slideshow -- I want a PDF or a multipage TIFF. Yes, PDF is a sensible choice in this situation -- but what basis do you have for saying that multipage TIFF isn't? It's widely used in that case, and there's software support for the tagging for

    • by algae ( 2196 )
      You're wrong - multi-page TIFF is a real image format, as about 2 seconds with Google would have told you if you'd bother to check. We used it extensively at a copy shop I used to work at that served law firms. What the OP wants is to use ghostscript as a converter from PDF to TIFF. If I recall correctly, you can specify multi-page TIFF as an output format.
    • Re: (Score:3, Informative)

      Both PDF and TIFF handle multiple pages, and have done so for years.

      Either would be suitable for this application.

      If you really want to convert these, Imagemagick would be the best tool to use.

      However- it seems a little daft for storage space to be the main reason for changing: you're simply exchanging one compressed image format for another. You may save 10% e.g. if you move from JPEG in the PDF to PNG/similar in the TIFF but is that really worth the effort?!

      If they really want shiny TIFFs, it would be eas

      • Re: (Score:3, Interesting)

        Imagemagick does it's conversion entirely in-memory, so if a document is more than a hundred pages or so you are going to have to have some problems.

      • Re: (Score:3, Informative)

        by mrcaseyj ( 902945 )

        I agree that unless the files are extremely huge or extremely numerous then storage space probably shouldn't be a concern because its cheap compared to your time and getting cheaper. But if storage space is a concern then you might look into the tiff format used by the patent office. Apparently it uses a form of lossless compression taken from fax machines and gets much better compression than many other common formats on black and white(no greyscale) documents. If it's the patent office's standard archive

    • I used to work at a company in the imaging/scanning/OCR market for *cough-large-printer-companies-cough*. The best format to save the images, IMO, is LZW Tiff format. PDF is fine and all for storing images, and provides the compact file size. However, you can do LZW Tiff images, and are able to convert Tiff images into other formats easier than trying the same with PDF. On that same note Tiff still retains all the image layers just the same. I've been out of the loop on processing techniques, but if you wan

  • Apple was kind enough to build this functionality into Mac OS X in the form of Preview and Automator (or Apple Script).

  • Ghostscript (Score:2, Informative)

    Ghostscript [wisc.edu] can do the conversion from the console.

    You can write a simple shell script to convert all files.

    • Solaris 10, (and I would presume Linux as well) has a package of TIFF utilities.

      We didn't do too much pdf2tiff (most stuff came in as TIFFs), but I don't remember any major issues the other way (tiff2pdf).

      I imagine a fully featured converter application would do a better job though.

  • We use a program called ImageSite [equorum.com] that handles that. It uses TIF files. Why reinvent the wheel?
    • Let me clarify, ImageSite manages the files. It doesn't, to my knowledge, convert PDF to TIF. I think you can use a program the prints the PDF to a TIF file.
  • DjVu (Score:4, Informative)

    by cduffy ( 652 ) <charles+slashdot@dyfis.net> on Tuesday June 24, 2008 @12:34PM (#23919295)

    The most effective compressors are commercial, but DjVu [djvu.org] is a very effective image archival format; see DjVuLibre [sourceforge.net] for the non-commercial tree.

    Moving back towards the question in the article, I don't think there's much worry about either TIFF or PDF in terms of future proofing; they're both very widely used, have multiple implementations and third parties with substantial interest in keeping those implementations maintained, etc. The quality of TIFF implementations varies wildly, but the good ones are only going to get better, and I'd be shocked if libtiff ended up terminally bitrotted without a successor implementing a superset of its functionality inside my lifetime.

  • You haven't specified what is on the microfilm chip in the card. If its largely text, I can't see why you'd want to lose the embedded text (searcheable etc. - TIFF would require OCR at some point..?) in PDF.
    • by BobMcD ( 601576 )

      You haven't specified what is on the microfilm chip in the card.
      Relax. I'm pretty confident it isn't pr0n...
  • But size does not have anything to do with it. TIFF is far simpler in structure than PDF and has therefore better compatibility. TIFF is also well documented. Of course, they would have to use raw tiff to get the advantages. The storage-space argument is secondary and matters only insofar as larger data sets have a higher irsk of corruption.

    • Re:Tiff is better (Score:5, Interesting)

      by pclminion ( 145572 ) on Tuesday June 24, 2008 @12:39PM (#23919423)

      But size does not have anything to do with it. TIFF is far simpler in structure than PDF and has therefore better compatibility. TIFF is also well documented. Of course, they would have to use raw tiff to get the advantages. The storage-space argument is secondary and matters only insofar as larger data sets have a higher irsk of corruption.

      I dispute the "well documented" claim. The TIFF standard is quite clear. Unfortunately, almost nobody adheres precisely to the standard. I work extensively with TIFF and PDF, and I have to say that the consistency I see in PDF is about 100 times more than what I see in TIFF. Your typical TIFF reader will contain thousands of hacks and workarounds for oddities that are produced by major players in the industry. While there is slightly non-compliant PDF, I have never seen things that even begin approaching the strangeness I see in TIFF on a daily basis. Having said that, I recommend TIFF plus search text metadata for archival, not PDF.

      • Re: (Score:1, Interesting)

        by Anonymous Coward

        I'd have to agree with that - I keep bumping into nasty combinations of old style JPEG in TIFF images combined with Wang annotations and highlighting - the choice of viewers that will cope with both of those at once is pretty limited

      • > Having said that, I recommend TIFF plus search text metadata for archival, not PDF.

        Can I ask why? You whole post seemed to slam TIFF - that it's too varied from the actual spec. How does that translate to being a good archival format? Or are you just saying, use TIFF if you follow the spec?

        • Or are you just saying, use TIFF if you follow the spec?

          Yes, I stopped my explanation too soon. If you stick to what's actually written (not implied by the existence of viewers that are already out there) in the Baseline TIFF specification, your file will be viewable everywhere. As far as text metadata, the curse is also the beauty, because TIFF easily lets you place tagged data in a file and it will be ignored by any reader which doesn't understand that tag.

          MS's searchable TIFF is your typical MS cre

  • by Cheesey ( 70139 ) on Tuesday June 24, 2008 @12:38PM (#23919393)

    Are TIFFs better than PDFs for future use? I wonder what format you think would last longer. Are there any other formats that you think would be better or more future-proof? To me, storage is not a good enough reason to go to TIFF, because storage prices are always dropping anyway.

    Don't use TIFF. Stay with PDF. PDF is what all the big digital libraries are using. It's a proper standard, it's readable and writable by lots of free open source software, so even if Adobe disappears in a puff of intellectual property, you'll still be able to read your documents.

    TIFF, on the other hand, is a container format (like AVI, but worse). It isn't fully supported by every program - what sort of TIFF do you want, anyway? Compressed with LZW? With RLE? Not compressed at all? There's free software that will read and write the most common types of TIFF, so you can certainly do it, but why give up the convenience of using PDF?

    Also, since they already have many of these files in PDF format and they want to convert them into multipage TIFFs, are there any programs that you can recommend that will perform batch processing of files so that we do not have to convert each PDF one by one?

    Use ghostscript. Use something like the following command line:

    gs -dNOPAUSE -sDEVICE=tiffgray -sOutputFile=output%02d.tiff -dBATCH -r300 input.pdf
    This turns input.pdf into a series of 300 dpi tiff files, one for each page, called output01.tiff, output02.tiff, etc. Change the DEVICE to get a different sort of tiff file, and use gs --help to get a list of options. You can easily wrap this command in a script of almost any sort to make the process fully automatic.
    • Yeah, not to mention that PDF supports several compression formats for images including zip, jpeg, and jpeg2000 at various quality levels if storage is such a concern. Just playing around with the image from the wiki article, file sizes range from 4795kb for uncompressed, to 285kb for maximum quality (but lossy) jpeg2k. A jpeg compressed tiff is about 660kb.

      Now, I've never had to deal with aperture cards IRL, so I'm not sure about the following: a lot of space seems to be wasted to preserve a few pieces of

  • Comment removed (Score:3, Informative)

    by account_deleted ( 4530225 ) on Tuesday June 24, 2008 @12:43PM (#23919525)
    Comment removed based on user account deletion
    • Or under windows just use ImageMagick
      http://www.imagemagick.org/script/binary-releases.php#windows [imagemagick.org]
      • Re: (Score:2, Informative)

        by ZERO1ZERO ( 948669 )
        imagemagick is Slow slow slow for multipage tiffs. Using tifftools on windows, creating and splitting multipage G4 tiffs is 20 (TWENTY) times faster using tifftools.
    • I had to create a bash script a while ago to convert color postscript to black and white tiff. I used netpbm and conjunction with ghostscript to do it. Those 2 programs together can do just about anything you want for batch file graphics. Not sure if it will work for .pdf, but .pdf is pretty close to post script and probably easy enough to convert to eps or straight ps. Gimp and as stated above, imagemagick, also have alot of useful batch processing tools, but you have to learn their script language (in the
  • You seem to be confusing the media with the data format. Whether it is TIFF or PDF is irrelevent. It's all ones and zeroes in the end, whether it is stored on punch cards, floppies, CDs or Flash RAM.

    In any case, the PDF and TIFF file formats are well-documented, and if ever even their widespread use makes them to be extinct (bloody unlikely), it would always be possible to write a program to convert them into the format-du-jour, provided, of course, you are able to read the media...

  • PDFs... (Score:3, Informative)

    by pdboddy ( 620164 ) <pdboddy AT gmail DOT com> on Tuesday June 24, 2008 @12:54PM (#23919799) Journal
    Stick with PDF. Chances are, neither PDF nor TIFF will vanish overnight. I'd say PDF is easier to work with, even with minimalist free tools. Since either one is technically "good" for archiving, why do more work than you really need to do, even with batch processing it'd be a pain.

    Acrobat has batch processing, and can convert pdfs to TIFF, JPEG, PNG and more. That would be my suggestion if you are really going to convert to TIFFs.
    • I disagree as the storage difference between a 5 mb to 18gig pdf compared to a 250k to 1mb tiff is huge! Where I work we do PDF to TIFF conversion just for that reason. Due mainly to Sharepoint having a 8mb limit per document you can upload.
      • by clodney ( 778910 )

        Don't forget that TIFF has a 4GB limit, due to all the file offsets being encoded as 32 bit values. If a TIFF tag can not store the data directly in the tag body, it stores the location of the data as an offset from the start of the file.

      • Re: (Score:3, Informative)

        by Eric Smith ( 4379 ) *
        If you have 5 MB PDF files that convert to 1 MB TIFF files, that means that the PDF files were encoded badly. There's no fundamental reason for PDF files to be significantly larger than TIFF files.
      • by pdboddy ( 620164 )
        I could argue that you've got a poorly made PDF. Especially one that's 18gigs in size, I've never seen one that large in ten years of working in the industry.

        Or I could argue that storage costs will go up over time, not down, so your argument over file size is moot. A properly made pdf will have a reasonable file size.
  • pdf2tiff.sh (Score:4, Informative)

    by Anonymous Coward on Tuesday June 24, 2008 @12:58PM (#23919915)

    let's not reinvent the wheel -- I did this about 9 months ago //wolfmann -- and this code is Public domain (done on federal gov't time):

    # cat pdf2tiff.sh
    #!/bin/bash

    for file in */*.pdf #for each pdf
    do
                    filename=`echo $file | cut -d'.' -f1`
                    if [ ! -e "$filename".tiff ]
                    then
                                    echo "gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg4 -sOutputFile=$filename.tiff $file"
                                    gs -q -dNOPAUSE -dBATCH -sDEVICE=tiffg3 -sOutputFile="$filename".tiff "$file" 2> /dev/null
                    else
                                    echo "$filename.tiff exists! skipping..."
                    fi
    done

    • Re: (Score:2, Informative)

      by Anonymous Coward
      Better method:
      #!/bin/bash
      for file in `find . -name '*.pdf'`
      do
      filename=${file%%.*}
      ...
      This will cause recurse into all subdirectories and it will work on files with multiple dots in the name (eg. applicant.12345.cv.pdf)
      • Re: (Score:1, Informative)

        by Anonymous Coward
        Well, if we want to correct others, -iname is the better option, but who cares, as long as we get modded informative for the basics...
      • Re: (Score:3, Interesting)

        by Khopesh ( 112447 )

        This code will not work on files with spaces in them, because the back-ticks will expand spaces without escapes while the shell globbing used in the original code will escape them. Since most PDF titles I've seen use spaces in their names, this is important. The rest of those modifications will help, though.

        Were I coding this script, I'd write two (since xargs can't use a function). One for the inner loop, and one for the outer loop (or if only running it once, I'd do the outer loop on the command line

        • by Khopesh ( 112447 )

          Oops, there should be only one percent sign in the target="${file%.*}.tiff" line. This enables your use of "User Manual 3.7.pdf" -> "User Manual 3.7.tiff" instead of "User Manual 3.tiff" as the above AC posted (and I copied). Also, the second script should have quotes around the first argument to find, which should use @ instead of 1, which respects multiple arguments and the spaces within them while still defaulting to the current working directory when there are no arguments.

          I've incorporated this

  • by moosesocks ( 264553 ) on Tuesday June 24, 2008 @12:59PM (#23919945) Homepage

    How *much* smaller are these TIFFs anyway? TIFF is actually a container format, and can support all sorts of compression, some of them proprietary, some of them common. Not all of them are lossless either (TIFF-Jpeg is a perfectly valid combination, and was used before the days of Exif to add metadata to jpegs). TIFFs can also include vectorized data. It's not all that much less complicated than a PDF.

    PDFs are also a container format to an extent. You could very well have a TIFF embedded in a PDF. Fortunately for us, the PDF specification is a bit more stringent on what is supported and what isn't, and PDFs tend to work just about everywhere (especially if all that you've got is an image). You can also apply all sorts of compression to PDFs to reduce their file size... these might not be quite as well supported.

    Both formats are extremely common, and it's extremely unlikely that you'll ever have to do any sort of conversion to display them. If I had to place money on it, I'd wager that PDF will be in widespread use for longer than TIFF, though neither format seems to be going anywhere anytime soon. You're more likely to have to worry about the storage devices you're using and the longevity of the media.

    If you just need to store lossless raster images, PNG might be a good bet. It's a "Free" format, and is officially endorsed as an ISO/IEC standard. TIFF is copyrighted by adobe. It also has the advantage of being a complete image format, rather than just a container, which means that any software that can open a PNG image should be able to open *any* PNG. Because of its open-sourceness and widespread adoption, PNG will be around for a long time to come as well. Once again, the storage medium and filesystem that you use to store the images is very likely going to become obsolete before the file format itself.

    Granted, PNG's compression algorithm isn't optimized for photographic data, though the image formats that *are* optimized for this purpose are neither common nor free.

    In summary, there's no reason that a PDF needs to be terribly larger than a PDF (the overhead should be especially negligible if you've got lots of images at a high-resolution). Neither format is going away anytime soon, but both have quirks that can hurt you in the future (Multi-page TIFFs are even somewhat of an oddity today). If you really want small files and future-proofing, go with PNG. Otherwise, it's more or less a non-issue.

  • PDF/A (Score:5, Insightful)

    by SpaghettiPattern ( 609814 ) on Tuesday June 24, 2008 @01:07PM (#23920133)
    Although the TIFF format is open and it is widely used in archiving systems, it is not particularly suited for an archive you setup new. The main reason is that many applications that generate TIFF may throw in their own proprietary stuff and lock you into a specific viewer. Also, you cannot do a text search of content in TIFF.

    When you discuss archves you think about looong times. Typically 10 to 50 years of retention with the odd exception where eternity is desired.

    Hence "plain" PDF is probably even worse than TIFF. One problem here are the included resources (fonts) and references (http links) which are mostly left out in order to save disk space. The other problem is that there are so many "plain" PDF versions to choose from and none of them will last 10 to 50 years.

    However, PDF is a good technology and therefor the PDF/A standard was developed. It is designed especially to deal with loooong term issues, is currently readable through almost any PDF reader and will be maintained by most sensible PDF readers for the years to come. There is NO vendor lock-in, you can put text in a PDF/A document an run searches against it. But most importantly, NO propitiatory stuff can be shoved in as it would result in an invalid document (a PDF document maybe but not a PDF/A document.)

    With the price of current disk space you should NOT make file size a defining criterion in your archiving policy. Only on z/OS disk space comes at absurd and ridiculous prices. If you can, try aiming for an archiving solution on Unix, Linux or even Windows.

    I am in the archiving business. At the moment PDF/A is the only format suitable for archiving.
  • by spyrochaete ( 707033 ) on Tuesday June 24, 2008 @01:12PM (#23920267) Homepage Journal
    I would recommend OCRing these documents and storing them in some kind of text-based format (in addition to the graphical format of your choice). If you have particularly voluminous back-catalogues of these documents you'll be very thankful in the future if you have the option to search-enable this textual content.

    A graphic image of text is like a wax apple - it looks and tastes like a replica.
    • You can OCR stuff, store the text in the PDF.

      • Since the alternative format being discussed is a graphical image I made the assumption that they're currently scanning flat images into multi-page PDFs. If they're OCRing into PDF then that format would indeed be indexable by most enterprise search engines (as well as free desktop search engines).
    • Well ... while I doubt it's doable, it would be really cool with a PDF->LaTeX program. Transform everything in the pdf into LaTeX files (except pictures). Would give you free text search and an easy way to compress the stuff really well.

      • You doubt that what is doable? OCRing to PDF is a very common procedure, and many enterprise and desktop search products have no trouble reading and indexing PDFs. I've never seen LaTeX used in any corporate environment or by any individual except for malicious university professors.
  • PDF makes sense for document signing, security, and damage detection. TIFF does not have any of this important security and data integrity protection by itself.

    PDF also allows for the same compression on the scanned image that TIFF does, as well as much better compression methods available to it.

    TIFF, while well-understood in the archival industry, has rather fledgling support in the free *NIX world--especially multi-page TIFF.

    Finally, with PDF, you can preserve both the image and the OCR data all in the s

  • I have a similar issue, but have chosen PDF because they meet the digital signature requirements of most professional licensure boards (architects and engineers worry about this stuff). It's not a large hurdle, just that the documents can be externally verified against a publically available key. Adobe lets you do that for free (well, assuming you have their s/w; I can post a key on my website for a 3rd party to install and verify the signature).

    This isn't a high-crypto requirement area - you can easily fa

    • Any file can be digitally signed with GPG or PGP. If your customers are used to looking for public keys on your site anyway, you might as well make them PGP pubkeys.

      • Even the pirate groups do that.

        They'll have the "goods" and a md5sum.txt with the md5sum of the "goods" and a GPG signature around the md5sum AND the gpg signature of the file. Then they zip it.

  • I cannot find an analogy to how fundamentally incorrect the submitters mental model of PDF's and Tiffs are. Think of PDF as a container format, you can compress the images inside the PDF to your heart's content, much smaller than TIFF will do since it can use JPEG or PNG or whatever format you want. Tools->Print Production->PDF Optimizer. It even has OCR and some scanned image auto cleaning. The easiest thing is just to have them change their scan resolution, down to say, 150 DPI and B+W instead o
    • by pdboddy ( 620164 )
      What's the point of archiving something if you're going to potentially lose information by dropping the resolution? Especially something like the aperture cards the OP is having archived. The Wikipedia image is horrible, and I can't imagine what that would look like in B&W... if that's a typical aperture card, it's going to look horrible.

      And I'd be careful trusting the pdf optimizer for for file size reduction. Not always the best results. But I'd still put my money on PDF.
  • We have a document management system. In it we had to make the decision for PDF or tiff. We opted for tiff. It had nothing to do with the file size. The deciding factor was because we could find FAST tiff viewers all day and night. It's probably not that PDF as a format is that much more bloated, but the readers, especially acrobat reader take a LOT longer to start up.

    We use an activeX control called alternatiff to view them in the browser (and yes, it does multipage) The control loads in the browser

  • Send them a quotation. If the money looks good, do it and don't bitch about it on slashdot. If it does not look good, decline the job and don't bitch about it on slashdot either. Either way, don't bitch about it on slashdot.

  • Legal service firms work with all of these PDF and TIFF variants all of the time. They should be able to kick out whatever you need at x cents per page (which will usually be cheaper than your time/money)

    The weird TIFF formats are used for various document management products, so it really depends mostly on your workflow.

  • The gold-standard tool for this is PDF2IMG [datalogics.com] which uses Adobe's own PDF rendering library but it'll set you back a few thousand dollars.

    Ghostscript is good but it isn't perfect: it does choke on some PDFs, misrenders some and won't pick up non-embedded TTF fonts, only external PS fonts. It also doesn't do any anti-aliasing so you probably want to render large and sample down and (IIRC) there's a max image size it can render. But by and large it does just work.

  • Shell script (Score:3, Informative)

    by debatem1 ( 1087307 ) on Tuesday June 24, 2008 @02:01PM (#23921377)
    Honest to God, what you're talking about is a trivial task. Use ghostscript, or, if you don't have the time or interest, contact me with your requirements and I'll write it for you gratis, provided it remains F/OSS.
  • are there any programs that you can recommend that will perform batch processing of files so that we do not have to convert each PDF one by one?
    Sing with me! That's what loops are for.
  • Converting (Score:3, Insightful)

    by ZorbaTHut ( 126196 ) on Tuesday June 24, 2008 @03:29PM (#23922967) Homepage

    I hear a lot of talk about how to convert back and forth, but nobody's mentioning the thing that I would consider the most important:

    When you convert from .png to .tif, are you losing data?

    Most of these convert scripts seem to work by starting Ghostview and rendering a .tif out of your PDF. This is a *terrible terrible idea*. What you'd really want to do is reach into the PDF itself, and extract the lossless images perfectly. Anything else is like printing the .PDF and scanning the printout - you might lose pixels, you might gain extra pixels, and you almost certainly won't be perfectly aligned with the "pixel grain" of the original image.

    Unless you can guarantee that you'll pulling out, pixel-by-pixel, the exact original data, I would stick with PDFs.

  • Why bother? (Score:2, Informative)

    As someone who writes software to view PDFs [icesoft.com], I can tell you this is completely pointless, since anything that saves scanned documents into PDF is really storing it as a TIFF image inside of the PDF anyway. The PDF container adds useful features for metadata, and is well documented [adobe.com], so shouldn't add any future-proof issues. And the overhead is probably a few kilobytes.

  • by klossner ( 733867 ) on Tuesday June 24, 2008 @05:08PM (#23924441)

    TIFF files have a maximum size of 4GB. (The "value offset" field of an IFD entry is a 32-bit value.) You can exceed this with 50 noisy pages. PDF files have a maximum size of 10 to the tenth power bytes. (The byte offset in a cross-reference table entry is a ten-digit decimal number.) That's 2.3 times the maximum TIFF file size.

    I have written software to create both TIFF and PDF files. I would use PDF for archiving. Even today, it's tricky to find a TIFF reader that will run on all the important platforms and handle the variety of compression flavors (e.g., JBIG2.)

  • Implementations in minutes. Converts to most anything. Not the most efficient though

  • I won't speak to why or whether you should do it, but here are a few options for how.

    Doculex has an app called MPTiffIt [doculex.com] that will do single to multi or multi to single page tiff conversion. You'd need to convert the PDFs to single page tiff via Save As (or perhaps a Batch Process), then recombine them with MPTiffIt.

    Or, you could use a Tiff printer driver along with a batch printing software.

    Personally, I'd use L.A.W. [imagecap.com] (Legal Access Ware, created by Image Capture Engineering and now owned by Lexis Nexis), wh

  • While you're at it, why not decode the data punched on each card and then just store the microfilm image and the decoded data, discarding the image of the rest of the card? That'd make things a lot more efficient.

A list is only as strong as its weakest link. -- Don Knuth

Working...