In case you have missed it, there is a flaw in the code that writes bzip2 archives in both Ant and Commons Compress. There are new releases for both of them, so go grab them: Ant, Commons Compress.

As part of the process of creating bzip2 compressed blocks the input data (usually in chunks of 900kb) is sorted (during the Burrows-Wheeler transformation, if you want to know). The only sorting algorithm present in the bzip2 classes prior to the security release is very efficient for the average case but shows extraordinarily bad performance for very repetitive inputs. For certain inputs the bzip2 task took two hours on my really fast work notebook (at 100% CPU for a single core) while it finishes in less than two seconds with Ant 1.8.4.

These inputs have to be specially crafted, it is very unlikely you will face them in the wild. The flaw turns into a security issue if you are providing a public service that compresses input created by arbitrary users - maybe a public build server or an archiving solution.

The bzip2 code in Ant (and all forks that stem from it, like Commons Compress) was derived from an early version of Julian Seward's libbzip2. Starting with 0.9.5 libbzip2 detects if sorting is taking too long because of bad inputs and switches to a different sorting strategy in such cases. The fix in the two releases now consists of porting this fallback sorting algorithm from C to Java.

While porting this I learned a lot. I read several academic papers in order to understand what was actually going on. It felt like I was back in University again and it felt good.

Many thanks to David Jorm of the Redhat Security Team who uncovered the issue.

path: /en/Apache/Ant | #