Techniques for finding near duplicate records

If you’re just doing small batches that are relatively well-formed, then the compare.linkage() or compare.dedup() functions in the RecordLinkage package should be a great starting point. But if you have big batches, then you might have to do some more tinkering. I use the functions jarowinkler(), levenshteinSim(), and soundex() in RecordLinkage to write my own … Read more