Twitter image encoding challenge [closed]

image files and python source (version 1 and 2)

Version 1
Here is my first attempt. I will update as I go.

I have got the SO logo down to 300 characters almost lossless. My technique uses conversion to SVG vector art so it works best on line art. It is actually an SVG compressor, it still requires the original art go through a vectorisation stage.

For my first attempt I used an online service for the PNG trace however there are MANY free and non-free tools that can handle this part including potrace (open-source).

Here are the results

Original SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo.png Original
Decoded SO Logo http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded.png After encoding and decoding

Characters: 300

Time: Not measured but practically instant (not including vectorisation/rasterisation steps)

The next stage will be to embed 4 symbols (SVG path points and commands) per unicode character. At the moment my python build does not have wide character support UCS4 which limits my resolution per character. I’ve also limited the maximum range to the lower end of the unicode reserved range 0xD800 however once I build a list of allowed characters and a filter to avoid them I can theoretically push the required number of characters as low as 70-100 for the logo above.

A limitation of this method at present is the output size is not fixed. It depends on number of vector nodes/points after vectorisation. Automating this limit will require either pixelating the image (which removes the main benefit of vectors) or repeated running the paths through a simplification stage until the desired node count is reached (which I’m currently doing manually in Inkscape).

Version 2

UPDATE: v2 is now qualified to compete. Changes:

  • Command-line control input/output and debugging
  • Uses XML parser (lxml) to handle SVG instead of regex
  • Packs 2 path segments per unicode symbol
  • Documentation and cleanup
  • Support style=”fill:color” and fill=”color”
  • Document width/height packed into single character
  • Path color packed into single character
  • Color compression is acheived by
    throwing away 4bits of color data per
    color then packing it into a character via hex conversion.

Characters: 133

Time: A few seconds

v2 decoded http://www.warriorhut.org/graphics/svg_to_unicode/so-logo-decoded-v2.png After encoding and decoding (version 2)

As you can see there are some artifacts this time. It isn’t a limitation of the method but a mistake somewhere in my conversions. The artifacts happen when the points go outside the range 0.0 – 127.0 and my attempts to constrain them have had mixed success. The solution is simply to scale the image down however I had trouble scaling the actual points rather than the artboard or group matrix and I’m too tired now to care. In short, if your points are in the supported range it generally works.

I believe the kink in the middle is due to a handle moving to the other side of a handle it’s linked to. Basically the points are too close together in the first place. Running a simplify filter over the source image in advance of compressing it should fix this and shave of some unnecessary characters.

UPDATE:
This method is fine for simple objects so I needed a way to simplify complex paths and reduce noise. I used Inkscape for this task. I’ve had some luck with grooming out unnecessary paths using Inkscape but not had time to try automating it. I’ve made some sample svgs using the Inkscape ‘Simplify’ function to reduce the number of paths.

Simplify works ok but it can be slow with this many paths.

autotrace example http://www.warriorhut.org/graphics/svg_to_unicode/autotrace_16_color_manual_reduction.png cornell box http://www.warriorhut.com/graphics/svg_to_unicode/cornell_box_simplified.png lena http://www.warriorhut.com/graphics/svg_to_unicode/lena_std_washed_autotrace.png

thumbnails traced http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_autotrace.png

Here’s some ultra low-res shots. These would be closer to the 140 character limit though some clever path compression may be need as well.

groomed http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_groomed.png
Simplified and despeckled.

trianglulated http://www.warriorhut.org/graphics/svg_to_unicode/competition_thumbnails_triangulated.png
Simplified, despeckled and triangulated.

autotrace --output-format svg --output-file cornell_box.svg --despeckle-level 20 --color-count 64 cornell_box.png

ABOVE: Simplified paths using autotrace.

Unfortunately my parser doesn’t handle the autotrace output so I don’t know how may points are in use or how far to simplify, sadly there’s little time for writing it before the deadline. It’s much easier to parse than the inkscape output though.

Leave a Comment