Reposting what I just sent to the xmlunit-general list for wider feedback, don't hesitate dropping me a mail if you want to stop me.

Hi all

if you are following the commits, you may have seen I've started to work
on XMLUnit 2.x again, nothing big, but I'm trying to get it done this
time.

Apart from re-thinking the abstractions used in the difference engine
the biggest piece missing for a release - even a beta - is
documentation.  Here I realized more and more that I really do not feel
like writing a big docbook document again, I need someting more
ligthweight with more immediate feedback.  A wiki would be fine, but
we've also always had a PDF version and I'd like to keep it that way.

Also I realized I prefer git over subversion by now, so I'd like to move
2.x to git.  I know I could do so on Sourceforge, but right now I feel
it would be best to move active development over to github.  A quick
search showed several forks over there, so we may be able to reconnect
and make this a community effort rather than mostly a one-man-show.

Here is what I intend to do, please let me know if anything looks
completely wrong:

* create an xmlunit organization at github
* create three repositories (for now)
  * XMLUnit Java 2.x - including the matchers and xmlunit-legacy
  * XMLUnit .NET 2.x - including constraints
  * a pure Wiki repository for the user guide.  github markdown works
    well enough for me and pandoc can create PDF or even epub from it
    (thanks to Stefan Tilkov for pointing me at it)
* start using github's issue tracker
* create a github.io site for XMLUnit 2.x
* keep using Sourceforge for XMLUnit 1.x
  * move XMLUnit 1.x back to svn trunk and keep it at sourceforge
  * keep using sourceforge's issue tracker for 1.x
  * keep the xmlunit.sf.net site for 1.x
* keep using the xmlunit-general@lists.sourceforge as primary
  communication medium for all versions
* keep the sourceforge forums even though I never liked them (or any web
  forum)

path: /en/oss/XMLUnit | # | Writebacks

Last week the company I work for has published a podcast in which a colleague and I talk about the ASF, what it is, how it works and what I do there. German language only.

path: /en/personal/publications | # | Writebacks

In a project at innoQ we're using QMQP to quickly queue mail to an MTA for delivery.

Even though - or maybe because - the protocol looks rather simple, we didn't find any open source library for this. We've decided to open source our own implementation QMQP Java, version 0.1 is available from Maven central (com.innoq.qmqp:qmqp-client:0.1) under the Apache License 2.0.

This initial release is strongly tailored to our project's needs. If you want to use it and find it lacking anything, don't hesitate and use a pull request or open an issue at github.

path: /en/oss/QMQP | # | Writebacks

In case you have missed it, there is a flaw in the code that writes bzip2 archives in both Ant and Commons Compress. There are new releases for both of them, so go grab them: Ant, Commons Compress.

As part of the process of creating bzip2 compressed blocks the input data (usually in chunks of 900kb) is sorted (during the Burrows-Wheeler transformation, if you want to know). The only sorting algorithm present in the bzip2 classes prior to the security release is very efficient for the average case but shows extraordinarily bad performance for very repetitive inputs. For certain inputs the bzip2 task took two hours on my really fast work notebook (at 100% CPU for a single core) while it finishes in less than two seconds with Ant 1.8.4.

These inputs have to be specially crafted, it is very unlikely you will face them in the wild. The flaw turns into a security issue if you are providing a public service that compresses input created by arbitrary users - maybe a public build server or an archiving solution.

The bzip2 code in Ant (and all forks that stem from it, like Commons Compress) was derived from an early version of Julian Seward's libbzip2. Starting with 0.9.5 libbzip2 detects if sorting is taking too long because of bad inputs and switches to a different sorting strategy in such cases. The fix in the two releases now consists of porting this fallback sorting algorithm from C to Java.

While porting this I learned a lot. I read several academic papers in order to understand what was actually going on. It felt like I was back in University again and it felt good.

Many thanks to David Jorm of the Redhat Security Team who uncovered the issue.

path: /en/Apache/Ant | # | Writebacks

Some of my colleagues at innoQ have put together a bunch of rules about what makes up a web application that actually uses the web rather than hides it.

There is more on Stefan Tilkov's blog and the ROCA website. Discussion (there, not here) is more than welcome.

path: /en/unsorted | # | Writebacks