how to parallelize many (fuzzy) string comparisons using apply in Pandas?
You can parallelize this with Dask.dataframe. >>> dmaster = dd.from_pandas(master, npartitions=4) >>> dmaster[‘my_value’] = dmaster.original.apply(lambda x: helper(x, slave), name=”my_value”)) >>> dmaster.compute() original my_value 0 this is a nice sentence 2 1 this is another one 3 2 stackoverflow is nice 1 Additionally, you should think about the tradeoffs between using threads vs processes here. Your … Read more