How to identify and label similar rows in a pandas data frame

Your question really needs clarifications but since they are MIA i will assume ahead.

ASSUMPTIONS

  • The column on the left comes from a list of lists. I will name that “alist_oflists”.
  • Every time a unique inner list is found, a new integer-type identifier is attributed to it.
  • The output can simply be a list of lists again with the inner ones being single item lists containing the IDs found earlier. The order of the two lists must match.

alist_oflists = [[1, 1000], [2, 10], [2, 100], [2, 10], [3, 1000], [2, 100], [2, 10]]

# we need tuples instead of lists cause lists are not hashable (will be used as dict keys)
alist_oftuples = [tuple(x) for x in alist_oflists]

print(alist_oftuples) # prints:[(1, 1000), (2, 10), (2, 100), (2, 10), (3, 1000), (2, 100), (2, 10)]

a_dict = {}
i = 1
for items in alist_oftuples:
    if items in a_dict.keys():
        continue
    else:
        a_dict[items] = i
        i += 1

i_wanna_see_results = []
for item in alist_oftuples:
    i_wanna_see_results.append(a_dict[item])

print(i_wanna_see_results) # prints: [1, 2, 3, 2, 4, 3, 2]

Is this what you wanted to have?

Leave a Comment