Run length encoding in Python

I see many great solutions here but none that feels very pythonic to my eyes. So I’m contributing with a implementation I wrote myself today for this problem.

def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
    """Returns run length encoded Tuples for string"""
    # A memory efficient (lazy) and pythonic solution using generators
    return ((x, sum(1 for _ in y)) for x, y in groupby(data))

This will return a generator of Tuples with the character and number of instances, but can easily be modified to return a string as well. A benefit of doing it this way is that it’s all lazy evaluated and won’t consume more memory or cpu than needed if you don’t need to exhaust the entire search space.

If you still want string encoding the code can quite easily be modified for that use case like this:

def run_length_encode(data: str) -> str:
    """Returns run length encoded string for data"""
    # A memory efficient (lazy) and pythonic solution using generators
    return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))

This is a more generic run length encoding for all lengths, and not just for those of over 4 characters. But this could also quite easily be adapted with a conditional for the string if wanted.

Leave a Comment