This seems like a graph problem.
You could try to use networkx
:
import networkx as nx
G = nx.from_pandas_edgelist(df, 'v1', 'v2')
clusters = nx.connected_components(G)
output:
[{'be', 'belong'}, {'delay', 'increase', 'decrease'}, {'analyze', 'assay'},
{'report', 'bespeak', 'circulate'}, {'induce', 'generate'}, {'trip', 'cause'},
{'distinguish', 'isolate'}, {'infect', 'give'}, {'prove', 'result'},
{'intercede', 'describe', 'explain'}, {'affect', 'expose'}, {'restrict', 'suppress'}]
As graph:
Small function to plot the graph in jupyter:
def nxplot(G):
from networkx.drawing.nx_agraph import to_agraph
A = to_agraph(G)
A.layout('dot')
A.draw('/tmp/graph.png')
from IPython.display import Image
return Image(filename="/tmp/graph.png")