Since your data seem to guarantee 3 unique crops per country (“I am compiling a table of top-3 crops by county.”), it suffices to sort the values and assign back.
import numpy as np
cols = ['Crop1', 'Crop2', 'Crop3']
df1[cols] = np.sort(df1[cols].to_numpy(), axis=1)
County Crop1 Crop2 Crop3 Total_pop
0 Harney apples grain melons 2000
1 Baker apples grain melons 1500
2 Wheeler apples grain melons 3000
3 Hood River apples grain melons 1500
4 Wasco carrots pears raddish 2000
5 Morrow carrots pears raddish 2500
6 Union carrots pears raddish 2700
7 Lake carrots pears raddish 2000
Then to summarize:
df1.groupby(cols).sum()
# Total_pop
#Crop1 Crop2 Crop3
#apples grain melons 8000
#carrots pears raddish 9200
The benefit is that you avoid Series.apply
or .apply(axis=1)
. For larger DataFrames
, the performance difference is noticeable:
df1 = pd.concat([df1]*10000, ignore_index=True)
cols = ['Crop1', 'Crop2', 'Crop3']
%timeit df1[cols] = np.sort(df1[cols].to_numpy(), axis=1)
#36.1 ms ± 399 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
to_sum = ['Crop1', 'Crop2', 'Crop3']
%timeit df1[to_sum] = pd.DataFrame(df1.loc[:, to_sum].apply(set, axis=1).apply(list).values.tolist(), columns=to_sum)
#1.41 s ± 51.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)