Jeremy Wagner

Faster bulk image optimization in bash

20 April, 2017

In an earlier post, I talked about how you could use the find command in bash to find all files of a specific extension and pass them along to the image optimizer of your choosing. In instances where I don't have time to automate this task with a tool such as gulp, this has proved incredibly valuable.

Lately I've had to convert large batches of images for various projects. The find command, while serviceable in its own right with the -exec flag, only allows for serial processing of the files it finds. This is where xargs came in handy. With xargs I had a way of doing this work in parallel. I recently optimized a batch of about 500 JPEGs using jpeg-recompress. Below was the non-xargs way of accomplishing this task:

find ./ -type f -name '*.jpg' -exec jpeg-recompress --min 30 --max 70 --method smallfry --accurate --strip {} {} \;

If you're not sure of all the parameters in this command, run jpeg-recompress with the --help flag and read my earlier post for some context. All I'm doing is passing files found by find one by one to jpeg-recompress with the -exec flag. The {} placeholders are file references. In my testing, the above command took roughly 2 minutes and 10 seconds to complete. Now what about xargs? Let's first see how it's used in conjunction with find:

find ./ -type f -name '*.jpg' | xargs -P 32 -I {} jpeg-recompress --min 30 --max 70 --method smallfry --accurate --strip {} {}

This command is identical to the first, up until the point that we pipe the output from find to xargs (in lieu of find's -exec flag). The -P 32 argument is the important bit here: It represents the number of simultaneous processes. The -I {} bit allows us to read the name of the file provided by find from standard input. The rest of the command is pretty much the same as before, but it works quite a bit faster. Using xargs cuts the total processing time down to 1 minute and 10 seconds. Not too bad. Of course, your mileage may vary depending on your hardware.

Keep in mind: Don't set the number of concurrent processes too high, as you'll likely see diminishing returns. Furthermore, you might consume too many resources and potentially make your system unresponsive. Using xargs may not prove much more useful than serialized processing for small batches, but it really shines when you're optimizing a large batch of images with a CPU-intensive encoder like Guetzli. On a batch of approximately 500 images, I was able to reduce processing time with Guetzli from 150 minutes down to about 80. Definitely worth the trouble.

Here's hoping you can find some use for xargs, be it for image optimization or something else altogether. I hope this article helps!

Feel like reading more? Head on back to the article list!