String similarity with Python + Sqlite (Levenshtein distance / edit distance)

Here is a ready-to-use example test.py: import sqlite3 db = sqlite3.connect(‘:memory:’) db.enable_load_extension(True) db.load_extension(‘./spellfix’) # for Linux #db.load_extension(‘./spellfix.dll’) # <– UNCOMMENT HERE FOR WINDOWS db.enable_load_extension(False) c = db.cursor() c.execute(‘CREATE TABLE mytable (id integer, description text)’) c.execute(‘INSERT INTO mytable VALUES (1, “hello world, guys”)’) c.execute(‘INSERT INTO mytable VALUES (2, “hello there everybody”)’) c.execute(‘SELECT * FROM mytable WHERE … Read more

PostgreSQL: Case insensitive string comparison

select * where email ilike ‘[email protected]’ ilike is similar to like but case insensitive. For escape character use replace() where email ilike replace(replace(replace($1, ‘~’, ‘~~’), ‘%’, ‘~%’), ‘_’, ‘~_’) escape ‘~’ or you could create a function to escape text; for array of text use where email ilike any(array[‘[email protected]’, ‘[email protected]’])

Compare string similarity

static class LevenshteinDistance { public static int Compute(string s, string t) { if (string.IsNullOrEmpty(s)) { if (string.IsNullOrEmpty(t)) return 0; return t.Length; } if (string.IsNullOrEmpty(t)) { return s.Length; } int n = s.Length; int m = t.Length; int[,] d = new int[n + 1, m + 1]; // initialize the top and right of the table … Read more

Similarity scores based on string comparison in R (edit distance)

The function adist computes the Levenshtein edit distance between two strings. This can be transformed into a similarity metric as 1 – (Levenshtein edit distance / longer string length). The levenshteinSim function in the RecordLinkage package also does this directly, and might be faster than adist. library(RecordLinkage) > levenshteinSim(“apple”, “apple”) [1] 1 > levenshteinSim(“apple”, “aaple”) … Read more

If “a == b” is false when comparing two NSString objects

You’re assuming that the C == operator does string equality. It doesn’t. It does pointer equality (when called on pointers). If you want to do a real string equality test you need to use the -isEqual: method (or the specialization -isEqualToString: when you know both objects are strings): if ([mySecondString isEqualToString:myString]) { i = 9; … Read more

Version number comparison in Python

How about using Python’s distutils.version.StrictVersion? >>> from distutils.version import StrictVersion >>> StrictVersion(‘10.4.10’) > StrictVersion(‘10.4.9’) True So for your cmp function: >>> cmp = lambda x, y: StrictVersion(x).__cmp__(y) >>> cmp(“10.4.10”, “10.4.11”) -1 If you want to compare version numbers that are more complex distutils.version.LooseVersion will be more useful, however be sure to only compare the same … Read more