Choosing Better-Quality JPEG Images With Software?

Choosing Better-Quality JPEG Images With Software? 291

Posted by timothy on Thursday July 16, 2009 @06:02PM from the on-the-tip-of-my-script dept.

kpoole55 writes "I've been googling for an answer to a question and I'm not making much progress. The problem is image collections, and finding the better of near-duplicate images. There are many programs, free and costly, CLI or GUI oriented, for finding visually similar images — but I'm looking for a next step in the process. It's known that saving the same source image in JPEG format at different quality levels produces different images, the one at the lower quality having more JPEG artifacts. I've been trying to find a method to compare two visually similar JPEG images and select the one with the fewest JPEG artifacts (or the one with the most JPEG artifacts, either will serve.) I also suspect that this is going to be one of those 'Well, of course, how else would you do it? It's so simple.' moments."

Choosing Better-Quality JPEG Images With Software?

This discussion has been archived. No new comments can be posted.

Search 291 Comments Log In/Create an Account

Comments Filter:

Easy (Score:3, Interesting)

by Anonymous Coward writes: on Thursday July 16, 2009 @06:05PM (#28723511)

Paste both images in your image editor of choice, one layer on top of each other, apply a difference/subtraction filter.

Re:AI problem? (Score:4, Interesting)

by Robotbeat ( 461248 ) writes: on Thursday July 16, 2009 @06:10PM (#28723591) Journal

...it will simply require a human-level brain.
How about Amazon's Mechanical Turk service?
https://www.mturk.com/ [mturk.com]

Measure sharpness? (Score:4, Interesting)

by Anonymous Coward writes: on Thursday July 16, 2009 @06:18PM (#28723693)

Compute the root-mean-square difference between the original image and a gaussian-blurred version?
JPEG tends to soften details and reduce areas of sharp contrast, so the sharper result will probably
be better quality. This is similar to the PSNR metric for image quality.
Bonus: very fast, and can be done by convolution, which optimizes very efficiently.

Fourier transform (Score:2, Interesting)

by maxwell demon ( 590494 ) writes: on Thursday July 16, 2009 @06:37PM (#28723931) Journal

Assuming the only quality loss is due to JPEG compression, I guess a fourier transform should give you a hint: I think the worse quality image should have lower amplitude of high frequencies.
Of course, that criterion may be misleading if the image was otherwise modified. For example noise filters will typically reduce high frequencies as well, but you'd generally consider the result superior (otherwise you woldn't have applied the filter).

Re:File size (Score:3, Interesting)

by Vectronic ( 1221470 ) writes: on Thursday July 16, 2009 @06:42PM (#28723981)

Also, stuff like Photoshop, will insert a bunch of meta/exif-bullshit but something like Paint, doesn't... it's usually only about 2 to 3kb, but it's still tainting your results if you are going by size alone.

Re:AI problem? (Score:5, Interesting)

by CajunArson ( 465943 ) writes: on Thursday July 16, 2009 @06:49PM (#28724061) Journal

I don't know about "quality", but frankly it shouldn't be too hard to compare similar images just by doing simple mathematical analysis on the results. I'm only vaguely familiar with image compression, but if a "worse" JPEG image is more blocky, would it be possible to run edge detection to find the most clearly defined blocks that indicates a particular picture is producing "worse" results? That's just one idea, I'm sure people who know the compression better can name many other properties that could easily be measured automatically.
What a computer can't do is tell you if the image is subjectively worse, unless the same metric that the human uses to subjectively judge a picture happens to match the algorithm the computer is using, and even then it could vary by picture to picture. For example, a highly colorful picture might hide the artifacting much better than a picture that features lots of text. While the "blockiness" would be the same mathematically, the subjective human viewing it will notice the artifacts in the text much more.

Automatic JPEG Artifact Removal (Score:4, Interesting)

by yet-another-lobbyist ( 1276848 ) writes: on Thursday July 16, 2009 @06:55PM (#28724165)

For what it's worth: I remember using Paint Shop Pro 9 a few years ago. It has a function called "Removal of JPEG artifacts" (or similar). I remember being surprised how well it worked. I also remember that PSP has quite good functionality for batch processing. So what you could do is use the "remove artifact" function and look at the difference before/after this function. The image with the bigger difference has to be the one of lower quality.
I am not sure if there is a tool that automatically calculates the difference between two images, but this is a task simple enough to be coded in a few lines (given the right libraries are at hand). For each color channel (RGB) of each pixel, you basically just calculate the square of the difference between the two images. Then you add all these numbers up (all pixels, all color channels). The bigger this number is, the bigger the difference between the images.
Maybe not your push-one-button solution, but should be doable. Just my $0.02.

How about audio? (Score:2, Interesting)

by bondiblueos9 ( 1599575 ) writes: on Thursday July 16, 2009 @07:07PM (#28724273)

I would very much like to do the same with audio. I have so many duplicate tracks in my music collection in different formats and bitrates.

Re:File size (Score:5, Interesting)

by Chyeld ( 713439 ) writes: <chyeld.gmail@com> on Thursday July 16, 2009 @07:43PM (#28724585)

There was a old story my AI teacher used to share back in college about a military contractor that was developing an AI based IFF (identifcation, friend or foe) system for aircraft.
They trained it using what was, at the time, a vast picture database of every aircraft known. In the lab, they were able to get it down to 99% accurate, with the error favoring 'unknown' as the third option.
So they took it out for a test run. The first night out the system tried firing on anything and everything it could lock on, including ground targets.
This was bad. Horribly bad. But they were certain that there was some sort of equipment failure going on. After all their AI was damn near perfect at ID'ing the targets in the lab, the issues must be up the line somewhere.
So they did a once over of the equipment and couldn't find a problem. Not sure what to do next the team took the system out for another dry run the next day. This time, the system refused to see any ground targets and anything it saw in the air was friendly.
Now this was getting ridiculous, the team was extremely confused. So they did what they should have done the first time around, they did a third test run looking at what the AI was actually 'thinking'.
And promptly discovered the problem. While they had a huge database of images to use, they realized that all their 'friendly' craft had pictures taken during the day, while in flight. All their 'hostile' craft however were pictures that had been taken at night during spy runs or from over head satalite shots.
The AI wasn't keying off the planes, it was keying off whether it was daytime or night time.
I don't know if the above actually ever happened, but my point is, it doesn't matter how many images you seed your database with. Unless you are there to tell it what is an artifact and what is just part of the picture, you are going to end up with horrible results and comical results.

Re:File size (Score:2, Interesting)

by mezis ( 595240 ) writes: on Thursday July 16, 2009 @08:19PM (#28724845)

Every single JPEG is lossy, for three reasons:

a. Source (color) digital images use RGB colorspace (typically, the raw format is "RAW" with a Bayer layout). JPEG compresses three planes, with a YCrCb colorspace.
Due to colorspace conversion and quantization error, you lose information. That's called "lossy".
b. Even in lossless JPEG, each 64-pixel block is KR-transformed and quantized. Again, always lossy.
c. No free lunch.

Typically, even lossless JPEG makes you lose 1-2% of the total information (measured via image entropy). Things are slightly better with lossless JPEG2000. Both are *perceptually* lossless.

Expert's answer (Score:2, Interesting)

by mezis ( 595240 ) writes: on Thursday July 16, 2009 @08:23PM (#28724883)

Exploit JPEG's weakness.

JPEG encodes pixels by using a cosine transform on 8x8 pixel blocks. The most perceptually visible artifacts (and the artifacts most suceptible to cause troble to machine vision algorithms) appear on block boundaries.

Short answer:
a. 2D-FFT your image
b. Use the value of the 8-pixel period response in X and Y direction as your quality metric. The higher, the worse the quality.

This is a crude 1st approximation but works.

thanks for the serious consideration here (Score:2, Interesting)

by kpoole55 ( 1102793 ) writes: on Friday July 17, 2009 @12:32AM (#28726131)

Thanks to the many who took this as a serious question and didn't turn this into a "It's just pr0n so who cares." Some is pr0n, some isn't, the most consistent thing is humor.
Many ideas needed the original image to find the better quality of the copy and some asked where I get these images from. These are linked in that I get the images from the USENET, from forums and from artists' galleries. This means that there's only a small set, from the artists' galleries, that I know are original. Others may be original but it may not be the original that comes to me first. On occasion, an artist may even publish the same image in different forms depending on the limitations of the different forums he frequents.
There were some ideas that were nicely different from the directions I was following that they'll give me more to think about.
I'll also acknowledge those who said that how the image is represented is less important than what the image represents. That's quite true but if I have a machine that can find the best representation of something I enjoy then why not use it.

Re:AI problem? (Score:3, Interesting)

by adolf ( 21054 ) writes: <flodadolf@gmail.com> on Friday July 17, 2009 @12:51AM (#28726215) Journal

It almost does what he wants. He doesn't spell it out, but it seems strongly implied that he also wants a system capable of automatically finding these duplicates by itself, and then automatically determining which image is "best."
Which seems obvious, to me: If he's got enough photos of sufficient disorganization that he can't tell automatically which duplicate is best, then there probably isn't any straight-forward way (with filenames or directory trees or whatever) to find out which ones are dupes to begin with.
Judge, the afore-linked program, only does the job of finding the best image out of a set of duplicates.
What tool can be used to find the (near) duplicates to begin with?

Structural Similarity Index Method (SSIM) (Score:2, Interesting)

by Paridel ( 728246 ) writes: on Friday July 17, 2009 @02:31AM (#28726609)

In general your best bet would be to use an image quality metric that takes into account how the human visual system works. The 2D frequency response of the human eye looks something like a diamond, which means that we see vertical and horizontal frequencies better than diagonal ones.

In fact, most image compression techniques (including JPEG) take this into account, however, conventional ways of determining the noise in images (minimum mean squared error, peak signal to noise, root mean squares) don't factor in the human visual system.

Your best bet is to use something like the structural similarity method (SSIM) by Prof. Al Bovik of UT Austin and his student Prof. Zhou Wang (now at the University of Waterloo).

You can read all about SSIM and get example code here: http://www.ece.uwaterloo.ca/~z70wang/research/ssim/ [uwaterloo.ca]

Or read more about image quality assessment at Prof. Bovik's website: http://live.ece.utexas.edu/research/Quality/index.htm [utexas.edu]

If you don't care about how it works, and just want to use it, you can get example code for ssim in matlab at that website and C floating around the net. The method is easy to use; essentially the ssim function takes two images and returns a number between 0 and 1 that describes how similar the images are. Given two compressed images and the original image, take the SSIM between each and the original. The compressed image with the higher SSIM value is the "best".

It sounds like for your problem you might NOT have the original uncompressed image. In that case you might try checking for minimal entropy or maximum contrast in your images.

Essentially entropy would be calculated as:

h = histogram(Image);
p = h./(number of pixels in image);
entropy = -sum(p./log2(p));

You will need to make sure you scale the image appropriately and don't divide by zero! Or better yet, you should be able to find code for image entropy and contrast on the web. Just try searching for entropy.m for a matlab version.

Good luck!

Re:File size (Score:3, Interesting)

by Chyeld ( 713439 ) writes: <chyeld.gmail@com> on Friday July 17, 2009 @11:12AM (#28730187)

I always wondered if that one wasn't an urban legend too, but appearently it was mostly true [hooksprogress.org]:
The reuse of some object-oriented code has caused tactical headaches for Australia's armed forces. As virtual reality simulators assume larger roles in helicopter combat training, programmers have gone to great lengths to increase the realism of their scenarios, including detailed landscapes and - in the case of the Northern Territory's Operation Phoenix - herds of kangaroos (since disturbed animals might well give away a helicopter's position).
The head of the Defense Science & Technology Organization's Land Operations/Simulation division reportedly instructed developers to model the local marsupials' movements and reactions to helicopters. Being efficient programmers, they just re-appropriated some code originally used to model infantry detachment reactions under the same stimuli, changed the mapped icon from a soldier to a kangaroo, and increased the figures' speed of movement.
Eager to demonstrate their flying skills for some visiting American pilots, the hotshot Aussies "buzzed" the virtual kangaroos in low flight during a simulation. The kangaroos scattered, as predicted, and the visiting Americans nodded appreciatively... then did a double-take as the kangaroos reappeared from behind a hill and launched a barrage of Stinger missiles at the hapless helicopter. (Apparently the programmers had forgotten to remove that part of the infantry coding.)
The lesson?
Objects are defined with certain attributes, and any new object defined in terms of an old one inherits all the attributes. The embarrassed programmers had learned to be careful when reusing object-oriented code, and the Yanks left with a newfound respect for Australian wildlife. Simulator supervisors report that pilots from that point onward have strictly avoided kangaroos, just as they were meant to.
Now the real story, with the Urban Myth removed...
On Friday DSD told the story of the killer kangaroos. Now we know the truth. And it is even weirder: the kangaroos threw beach balls!
Dr Anne-Marie Grisogono, Head, Simulation Land Operations Division at the Australian DSTO has told us what actually happened and we are delighted to set the record straight.
"I related this story as part of a talk on Simulation for Defence, at the Australian Science Festival on May 6th in Canberra. The Armed Reconnaissance Helicopter mission simulators built by the Synthetic Environments Research Facility in Land Operations Division of DSTO, do indeed fly in a fairly high fidelity environment which is a 4000 sq km piece of real outback Australia around Katherine, built from elevation data, overlaid with aerial photographs and with 2.5 million realistic 3d trees placed in the terrain in those areas where the photographs indicated real trees actually exist.
"For a bit of extra fun (and not for any strategic reason like kangaroos betraying your cover!) our programmers decided to put in a bit of animated wildlife. Since ModSAF is our simulation tool, these were modelled on ModSAF's Stinger detachments so that the associated detection model could be used to determine when a helo approached, and the behaviour invoked by such contact was set to 'retreat'. Replace the visual model of the Stinger detachment in your stealth viewer with a visual model of a kangaroo (or buffalo...) and you have wildlife that moves away when approached. It is true that the first time this was tried in the lab, we discovered that we had forgotten to remove the weapons and the 'fire' behaviour.
"It is NOT true that this happened in front of a bunch of visitors (American or any other flavour). We don't normally try things for the first time in front of an audience! What I didn't relate in the talk is that since we were not at that stage interested in weapons, we had not set any weapon or projectile types, so what the kangaroos fired at us was in fact the default object f
Read the rest of this comment...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Choosing Better-Quality JPEG Images With Software? 291

Choosing Better-Quality JPEG Images With Software? More Login

Choosing Better-Quality JPEG Images With Software?

Easy (Score:3, Interesting)

Re:AI problem? (Score:4, Interesting)

Measure sharpness? (Score:4, Interesting)

Fourier transform (Score:2, Interesting)

Re:File size (Score:3, Interesting)

Re:AI problem? (Score:5, Interesting)

Automatic JPEG Artifact Removal (Score:4, Interesting)

How about audio? (Score:2, Interesting)

Re:File size (Score:5, Interesting)

Re:File size (Score:2, Interesting)

Expert's answer (Score:2, Interesting)

thanks for the serious consideration here (Score:2, Interesting)

Re:AI problem? (Score:3, Interesting)

Structural Similarity Index Method (SSIM) (Score:2, Interesting)

Re:File size (Score:3, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot