Techniques for finding near duplicate records

If you’re just doing small batches that are relatively well-formed, then the compare.linkage() or compare.dedup() functions in the RecordLinkage package should be a great starting point. But if you have big batches, then you might have to do some more tinkering. I use the functions jarowinkler(), levenshteinSim(), and soundex() in RecordLinkage to write my own … Read more

Get the distinct sum of a joined table column

To get the result without subquery, you have to resort to advanced window function trickery: SELECT sum(count(*)) OVER () AS tickets_count , sum(min(a.revenue)) OVER () AS atendees_revenue FROM tickets t JOIN attendees a ON a.id = t.attendee_id GROUP BY t.attendee_id LIMIT 1; sqlfiddle How does it work? The key to understanding this is the sequence … Read more