How to find a good/optimal dictionary for zlib ‘setDictionary’ when processing a given set of data?
John Reiser explained on comp.compression: For the dictionary: make a histogram of short substrings, sort by payoff (number of occurrences times number of bits saved when compressed) and put the highest-payoff substrings into the dictionary. For example, if k is the length of the shortest substring that can be compressed (usually 3==k or 2==k), then … Read more