My code is entirely written in OCaml and is 500 lines long (including some additional utility programs). The code is documented with a literate programming tool for Objective Caml, ocamlweb (see the PostScript file below).
Spaces are compressed by the lexer (outside of TT environments). Thus the document kept as the ``original'' one is already smaller (in most cases) than the given one. If we are forced to return this one, at least spaces have been compressed.
Then the document is interpreted as a stream of characters decorated with contexts, following the specification given in the task. Originally, I chose to proceed by successive transformations of the document, but it appeared to be really easier (and probably faster) to work on the interpretation.
Finally, a new document is re-built from this stream, using this simple algorithm:
Regarding robustness, the program keeps the original document. When the program is launched, an alarm is set 5 seconds before the end of the timeout, with Unix.alarm; if triggered, it will output the original document. Similarly, if the optimization produces some too big document, the original one is given back.
Regarding Correctness, I setup an automatic benchmark of my program. For this purpose, I first wrote a validator (using the jury's one would have necessitated too many dialup connections :-), a program to dump the interpretation of a document and a program to generate random documents of arbitrary size. Using these programs, I could test my program heavily, and I did find bugs (which I could fix easily using the dumping program).
I also insisted on the running time, although it is not the first criterion which will be used by the jury. In particular, I tried to write only tail-recursive functions (and also because I was afraid begin eliminated for a Stack Overflow uncaught exception :-) For better performances, files manipulating streams are pre-processed with Camlp4. It really improved the running time. Thank you, Daniel!
You can also read the sources in a nicer way, looking at the PostScript document generated from the sources (with ocamlweb).