Remove some duplicates from list in python

this is where itertools.groupby may come in handy:

from itertools import groupby

a = ["a", "a", "b", "b", "a", "a", "c", "c"]

res = [key for key, _group in groupby(a)]
print(res)  # ['a', 'b', 'a', 'c']

this is a version where you could ‘scale’ down the unique keys (but are guaranteed to have at leas one in the result):

from itertools import groupby, repeat, chain

a = ['a', 'a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'a', 'a',  
     'd', 'd', 'a', 'a']
scale = 0.4

key_count = tuple((key, sum(1 for _item in group)) for key, group in groupby(a))
# (('a', 4), ('b', 2), ('c', 5), ('a', 2), ('d', 2), ('a', 2))

res = tuple(
    chain.from_iterable(
        (repeat(key, round(scale * count) or 1)) for key, count in key_count
    )
)
# ('a', 'a', 'b', 'c', 'c', 'a', 'd', 'a')

there may be smarter ways to determine the scale (probably based on the length of the input list a and the average group length).

Leave a Comment