RNGzip — type-based XML compression

This is the code described in our research paper in Data Compression Conference 2007. It should be considered alpha quality, and at this time is mainly intended for other researchers, not for production use. In particular, I will not guarantee yet that the compressed format will remain compatible with future versions of the program.

The simplest way to get started with RNGzip is to download the platform-independent binary (jar) file. Then, run it like this:

 $ java -jar rngzip-VERSION.jar --help
      

License

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

Abstract

The extensible markup language XML has become indispensable in many areas, but a significant disadvantage is its size: tagging a set of data increases the space needed to store it, the bandwidth needed to transmit it, and the time needed to parse it. We present a new compression technique based on the document type, expressed as a Relax NG schema. Assuming the sender and receiver agree in advance on the document type, conforming documents can be transmitted extremely compactly. On several data sets with high tag density this technique compresses better than other known XML-aware compressors, including those that consider the document type.

©20022015 Christopher League