Forgot your password?

typodupeerror
Graphics Software

Choosing Better-Quality JPEG Images With Software? 291

Posted by timothy
from the on-the-tip-of-my-script dept.
kpoole55 writes "I've been googling for an answer to a question and I'm not making much progress. The problem is image collections, and finding the better of near-duplicate images. There are many programs, free and costly, CLI or GUI oriented, for finding visually similar images — but I'm looking for a next step in the process. It's known that saving the same source image in JPEG format at different quality levels produces different images, the one at the lower quality having more JPEG artifacts. I've been trying to find a method to compare two visually similar JPEG images and select the one with the fewest JPEG artifacts (or the one with the most JPEG artifacts, either will serve.) I also suspect that this is going to be one of those 'Well, of course, how else would you do it? It's so simple.' moments."
This discussion has been archived. No new comments can be posted.

Choosing Better-Quality JPEG Images With Software?

Comments Filter:
  • by bcrowell (177657) on Thursday July 16 2009, @06:12PM (#28723623) Homepage

    The ImageMagick package includes a command called identify, which can read the EXIF data in the JPEG file. You can use it like this:

    identify -verbose creek.jpg | grep Quality

    In my example, it gave " Quality: 94".

    This will not work on very old cameras (from ca. 2002 or earlier?), because they don't have EXIF data. This is different info than you'd get by just comparing file sizes. The JPEG quality setting is not the only factor that can influence file size. File size can depend on resolution, JPEG quality, and other manipulations such as blurring or sharpening, adjusting brightness levels, etc.

  • Re:File size (Score:5, Informative)

    by Robotbeat (461248) on Thursday July 16 2009, @06:14PM (#28723651) Journal

    File size doesn't tell you everything about quality.

    For instance, if you save an image as a JPEG vs. first saving as a dithered GIF and _then_ saving as JPEG, then the second one will have much worse actual quality, even if it has the same filesize (it may well have worse quality AND have a larger file size).

  • DCT (Score:5, Informative)

    by tomz16 (992375) on Thursday July 16 2009, @06:18PM (#28723695)

    Just look at the manner in which JPEGs are encoded for your answer!

    Take the DCT (discrete cosine transform) of blocks of pixels throughout the image. Examine the frequency content of the each of these blocks and determine the amount of spatial frequency suppression. This will correlate with the quality factor used during compression!

       

  • by Anonymous Coward on Thursday July 16 2009, @06:20PM (#28723719)

    load up both images in adobe after effects or some other image compositing program and apply a "difference matte"

    Any differences in pixel values between the two images will show up as black on a white background or vise versa...

    adam
    BOXXlabs

  • Try ThumbsPlus (Score:3, Informative)

    by Anonymous Coward on Thursday July 16 2009, @06:21PM (#28723729)

    ThumbsPlus is an image management tool. It has a feature called "find similar" that should do what you want as far as identifying to pictures that are the same except for the compression level. Once the similar picture is found you can use ThumbsPlus to look at the file sizes and see which one is bigger.

  • Found it a while ago (Score:5, Informative)

    by sco08y (615665) on Thursday July 16 2009, @06:22PM (#28723749) Homepage

    I mean, you don't want second rate pictures in your pr0n stash?

    I had problems building it back then, let alone writing the scripts for it and the hassle of figuring out which images were duplicates, but this utility [schmorp.de] seems to fit the bill.

  • by trb (8509) on Thursday July 16 2009, @06:28PM (#28723813)
    google (or scholar-google) for Hosaka plots, or image quality measures. Ref:

    HOSAKA K., A new picture quality evaluation method.
    Proc. International Picture Coding Symposium, Tokyo, Japan, 1986, 17-18.

  • Re:File size (Score:5, Informative)

    by Shikaku (1129753) on Thursday July 16 2009, @06:29PM (#28723825)

    http://linux.maruhn.com/sec/jpegoptim.html [maruhn.com]

    No. You can compress JPEG lossless.

  • by thethibs (882667) on Thursday July 16 2009, @06:34PM (#28723887) Homepage
    More Noise = Less Compression
  • Blur Detection? (Score:2, Informative)

    by HashDefine (590370) on Thursday July 16 2009, @06:37PM (#28723921) Homepage

    I wonder if out of focus or blue detection methods will give you a metric which varies with the level of jpeg artifcats, after all the jpeg artifacts should make it more difficult to do things like edge detections etc which are the same the things that made more difficult by blurry and out of focus images

    A google search for blur detection should bring up things that you can try, Here [kerrywong.com] is series of posts that to do a good job of explaining some of the work involved

  • Re:File size (Score:1, Informative)

    by Anonymous Coward on Thursday July 16 2009, @06:52PM (#28724111)

    actually one of the meta values that is stored is a quality indicator.

  • Re:File size (Score:5, Informative)

    by Score Whore (32328) on Thursday July 16 2009, @06:52PM (#28724123)

    ...THERE IS NO LOSSLESS JPEG. PERIOD.

    Except for Lossless JPEG [wikipedia.org] standardized in 1993. But other than that, no there is no lossless jpeg.

  • by DotDotSlasher (675502) on Thursday July 16 2009, @06:56PM (#28724167)
    imagemagick can also compare two images, and tell you how different they are. That is -- quantify the differences by returning a floating point number or two (PSNR, RMSE) in a way that a more-compressed JPEG image will return a correspondingly different floating point value. I know the question concerns two JPEG-compressed images, but if you do have an original image -- and you want to test which is closest to the original, ImageMagick can do that. Use the ImageMagick compare function.
    See http://www.imagemagick.org/script/compare.php [imagemagick.org]

    Also, [[www.gimp.org]] is able to look at an image and approximate what JPEG compression quality setting was used, and use that same quality setting to save an output JPEG copy of the image. So -- they have some algorithm inside of their application which takes an image and returns (a good guess of) the corresponding jpeg quality value.
    Of course, this does not help you if the image was saved with a lousy JPEG quality value, like 10/100, and later saved at a much higher value, like 98/100. Since the algorithm only sees the last image, it would tell you the quality value is 98/100, even though the contents of the image would indicate the results of 10/100 compression, because of multi-generational lossy compression.
  • by uhmmmm (512629) <(uhmmmm) (at) (gmail.com)> on Thursday July 16 2009, @07:09PM (#28724301) Homepage

    JPEG works by breaking the image into 8x8 blocks and doing a two dimensional discrete cosine transform on each of the color planes for each block. At this point, no information is lost (except possibly by some slight inaccuracies converting from RGB to YUV as is used in JPEG). The step where the artifacts are introduced is in quantizing the coefficients. High frequency coefficients are considered less important and are quantized more than low frequency coefficients. The level of quantization is raised across the board to increase the level of compression.

    Now, how is this useful? The reason heavily quantizing results in higher compression is because the coefficients get smaller. In fact, many become zero, which is particularly good for compression - and the high frequency coefficients in particular tend towards zero. So partially decode the images and look at the DCT coefficients. The image with more high frequency coefficients which are zero is likely the lower quality one.

  • Re:AI problem? (Score:2, Informative)

    by kpoole55 (1102793) <Ken.Poole@shaPOLLOCKw.ca minus painter> on Thursday July 16 2009, @07:48PM (#28724633)
    I've been lax, in a way, in my pruning of late so the findimagedupes program found about 28000 groups of near duplicate images. Finding that many was a surprise and that's why I started looking to see if a program had been written yet for the next step, finding the better image. I wrote a little script that prunes the identical files but now run into the problem of non-identical files that contain the same or nearly the same image.
  • Re:AI problem? (Score:5, Informative)

    by arose (644256) on Thursday July 16 2009, @08:38PM (#28724977)
    AI or small utility [schmorp.de]... You never know with computers ;)
  • by Anonymous Coward on Thursday July 16 2009, @08:38PM (#28724979)

    jpgQ - JPEG Quality Estimator
    http://www.mediachance.com/digicam/jpgq.htm

  • Re:AI problem? (Score:3, Informative)

    by fractoid (1076465) on Thursday July 16 2009, @09:26PM (#28725313) Homepage
    Thou shalt not make a machine in the likeness of a human mind.
  • Re:AI problem? (Score:2, Informative)

    by Anonymous Coward on Thursday July 16 2009, @09:30PM (#28725339)

    Or you could just measure the amount of data in the DCT space. Duh.

    That'd be a Discrete Cosine Transform [wikipedia.org]
    (for the confused like me. Crazy what they can do with math these days)

  • Re:AI problem? (Score:5, Informative)

    by bendodge (998616) <bendodge AT bsgprogrammers DOT com> on Thursday July 16 2009, @09:41PM (#28725395) Homepage Journal
    Since the mods haven't noticed, and I don't have mod points, let me point out that THIS POST HAS THE ANSWER. A real program that will do what the asker wants. The source is available, but I can't seem to find its license (it includes some of the Independent JPEG Goup's code). Also, doesn't a jpeg's EXIF data or some other tag in the file tell you what quality it was saved at?
  • Re:File size (Score:5, Informative)

    by Binary Boy (2407) on Thursday July 16 2009, @11:12PM (#28725743)

    Lossless JPEG and lossless JPEG2000 are both exactly that - lossless. Not perceptually lossless, which is what people often use to refer to high-quality, lossy JPEG/JPEG2000, or JPEG-LS. Lossless JPEG uses a PCM-like encoder, not DCT, AFAIR. Lossless JPEG and lossless JPEG2000 are, in fact, lossless, at least with regards to image data in supported color spaces. This is in part a result of *not* converting to YCrCb, since that conversion is lossy, of course. Not all Lossless JPEGs are 8bit YCrCb.

    Accusoft, for one, has a toolkit for building lossless JPEG applications which supports 16bit RGB and greyscale lossless JPEG modes.

    The near-lossless JPEG you're thinking of is JPEG-LS, which is perceptually lossless, and guarantees a maximum error rate that is generally neglible for almost all applications. This format gets better compression ratios than Lossless JPEG, of course.

    Neither the lossless or near-lossless JPEG modes are common though, outside of niche apps. Lossless JPEG2000 is, however, since almost all JPEG2000 libraries support it alongside the lossy modes.

  • Re:AI problem? (Score:4, Informative)

    by bh_doc (930270) <blhiggins.gmail@com> on Friday July 17 2009, @02:29AM (#28726599) Homepage

    http://www.jhnc.org/findimagedupes/

    There's a bunch, but I know you can construct command line operations with this one. I imagine you could construct a system from this and the parent program that will find dupes, then nuke the poorer quality of each, or whatever.

  • by Anonymous Coward on Friday July 17 2009, @07:11AM (#28727685)

    ImageMagick does not need EXIF data. It estimates the quality by looking at the JPEG quantization table.

    $ convert logo: jpeg:- | identify -verbose - | grep Quality
        Quality: 92

  • by rwa2 (4391) * on Friday July 17 2009, @11:43AM (#28730659) Homepage Journal

    You probably don't necessarily want to find the "best quality" image, but rather the image that was closest to the original.

    I take it you're either trying to eliminate the low-quality duplicates or thumbnails from a really large collection of pr0n, or trying to write an image search engine that tries to present the "best" rendition of a particular image first.

    1. As a quick first pass (after you've run through to collect all the similar images into separate groups), you'd obviously want to find the version of the image with the highest resolution. This might let you easily throw out thumbnails or scaled down versions you might come across. Of course, some dorks will upscale images and post them somewhere, so you might still want to hang on to some of them for the second stage.

    2. For the second pass, you'd likely want to scan through the metadata first, especially stuff exposed by EXIF. So you'd want to give higher scores to EXIF data that makes it sound like it came directly off a digital camera or scanner, and bump down the desirability of pictures that appeared to have been edited by any sort of photo editing software.

    3. Then maybe you want to look at something that would rank down watermarks or other modifications.

    4. Another step would be to compare compression quality, but I think that's what most of the other posts are concentrating on. But this is a difficult step because it can be easily fooled, since idiots can re-save a low quality image with the compression quality cranked all the way up so the file size becomes high even though the actual image quality is worse than the original. You probably need to run it through one of those "photoshop detectors" that could tell you whether the image has been through smoothing or other filters in a photo editor. The originals (especially in raw format and maybe high quality JPEG) will have a certain type of CCD noise signature that your software might be able to detect. In the same vein, a poorly-compressed JPEG will have lots of JPEG quantization artifacts that your software might be able to detect as well. Otherwise, you're kinda left with zooming in on pics and eyeballing it.

    5. Finally you might be left with a group of images that are exactly the same but have different file names... you probably want some way to store some of the more useful bits of descriptive text as search/tag metadata, but then choose the most consistent file naming convention or slap on your own based on your own metadata.

    Hopefully this gives you a start to important parts of the process that you might have overlooked...

"I'm not stupid, I'm not expendable, and I'M NOT GOING!"

Working...