Count consecutive characters

Consecutive counts:

You can use itertools.groupby:

s = "111000222334455555"

from itertools import groupby

groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

After which, result looks like:

[("1": 3), ("0", 3), ("2", 3), ("3", 2), ("4", 2), ("5", 5)]

And you could format with something like:

", ".join("{}x{}".format(label, count) for label, count in result)
# "1x3, 0x3, 2x3, 3x2, 4x2, 5x5"

Total counts:

Someone in the comments is concerned that you want a total count of numbers so "11100111" -> {"1":6, "0":2}. In that case you want to use a collections.Counter:

from collections import Counter

s = "11100111"
result = Counter(s)
# {"1":6, "0":2}

Your method:

As many have pointed out, your method fails because you’re looping through range(len(s)) but addressing s[i+1]. This leads to an off-by-one error when i is pointing at the last index of s, so i+1 raises an IndexError. One way to fix this would be to loop through range(len(s)-1), but it’s more pythonic to generate something to iterate over.

For string that’s not absolutely huge, zip(s, s[1:]) isn’t a a performance issue, so you could do:

counts = []
count = 1
for a, b in zip(s, s[1:]):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

The only problem being that you’ll have to special-case the last character if it’s unique. That can be fixed with itertools.zip_longest

import itertools

counts = []
count = 1
for a, b in itertools.zip_longest(s, s[1:], fillvalue=None):
    if a==b:
        count += 1
    else:
        counts.append((a, count))
        count = 1

If you do have a truly huge string and can’t stand to hold two of them in memory at a time, you can use the itertools recipe pairwise.

def pairwise(iterable):
    """iterates pairwise without holding an extra copy of iterable in memory"""
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

counts = []
count = 1
for a, b in pairwise(s):
    ...

Leave a Comment