Comparing Images and Measuring their Similarity in PHP
I’m very proud of myself tonight. You see, I’m a programmer at heart. Yet it has been a long time since I just programmed something for fun. Not a contest, not a class. Just out of pure usefulness and interest. Here’s the story.
I setup a webcam to watch my room for the day, and some software that uploads the webcam’s image to my server via FTP, 6 times per minute. The software is Active Webcam, and I just used the free evaluation version. Over the course of the day, it generated 6,522 images – way too many to view at once. So I decided to script something to make looking at the images more interesting.
The first obvious problem is that most of the images look exactly the same. Nothing happened during the day, so the images have no easily visible distinction. Yet they are not the same in terms of bits: the brightness has slight variations, the JPEG compression differed, etc. So doing an md5 comparison (which will be the same for files which are the same) doesn’t cut it. I need to actually look at the image data.
Fortunately, PHP has a great built-in image library known as gd. You have to have PHP compiled with it, but my host does, so I suspect most others do, too. After much trial and error, I managed to do the following:
- Get a directory listing of all the images in the directory.
- Sort these images so that the oldest ones appear first, the newest last.
- Remove duplicates by comparing md5 hashes.
- Create a custom PHP function that compares two images and determines how similar (or different) they are.
- Loop through all the images, displaying only the ones that are substantially different from the one previously displayed.
In the end, this brought the total number of images from 6,522 down to 247. Take a look at the finished output here.
Here’s how the image comparison works:
First, I consider the image as a grid of 10×10 squares: this allows me to check fewer pixels, so the function takes much less time– I’ll call this the sampling rate. Then, I look at the color of that pixel in each image. By separating the Red, Blue, and Green values, I can find the difference between the pixels by using a simple 3-D distance formula; namely, sqrt( pow(r1-r2,2) + pow(b1-b2,2) + pow(g1-g2,2) / 255 (the range of a pixel is 0-255, which is the number of colors provided by 8 bits, or 2^8) – this should give me a decimal number. I add the values of all these comparisons, round the final result to the nearest integer, and use this number.
I did not do any extensive mathematical proof to see what my final range of values should be. Experimentally, I could see that a value of 98 (for my test images, sizes, and sampling rate) was a huge difference; and fortunately, 7 was a very small difference: too small for me to discern the difference. Thus there is a wide range returned by my function, and it is very accruate. Just look at the final output and see for yourself (warning! the page does contain 247 images!).
Here are the resources I used:
- php sort array
- php image compare
- imagecopyresized and imagecopyresampled
- php file does not exist (file_exists)
- php substr
- simple image comparison in php / compare images
- php image similarity
- imagesy, Image, imagecolorexact, imagecolorclosest, imagecolorat, imagecreate
- 3d distance formula php
- php imagecolorsforindex
- php math, square [^2 is actually a bitwise NOT operator, you have to use pow(base,power)]
- php round
If you have any questions, or ideas for applications of my dynamic image comparison function for PHP, leave a comment below!
P.S. I created from scratch the “code on a slate” image above using Fireworks, a web graphics editor. Awesome, isn’t it? Only took a few minutes, and I got to learn about masks in the process. Inside of it is real code from the PHP script I created to process my webcam images.