Levenshtein: MySQL + PHP
You need a levenshtein function in MySQL and query like $word = mysql_real_escape_string($word); mysql_qery(“SELECT `term` FROM `words` WHERE levenshtein(‘$word’, `term`) BETWEEN 0 AND 4”);
You need a levenshtein function in MySQL and query like $word = mysql_real_escape_string($word); mysql_qery(“SELECT `term` FROM `words` WHERE levenshtein(‘$word’, `term`) BETWEEN 0 AND 4”);
I have connected to my MySQL server and simply executed this statement in MySQL Workbench, and it simply worked – I now have new function levenshtein(). For example, this works as expected: SELECT levenshtein(‘abcde’, ‘abced’) 2
In order to efficiently search using levenshtein distance, you need an efficient, specialised index, such as a bk-tree. Unfortunately, no database system I know of, including MySQL, implements bk-tree indexes. This is further complicated if you’re looking for full-text search, instead of just a single term per row. Off-hand, I can’t think of any way … Read more
I just addressed this exact same issue a few weeks ago. Since someone is asking now, I’ll share the code. In my exhaustive tests my code is about 10x faster than the C# example on Wikipedia even when no maximum distance is supplied. When a maximum distance is supplied, this performance gain increases to 30x … Read more
Translated from Wikipedia : Option Explicit Public Function Levenshtein(s1 As String, s2 As String) Dim i As Integer Dim j As Integer Dim l1 As Integer Dim l2 As Integer Dim d() As Integer Dim min1 As Integer Dim min2 As Integer l1 = Len(s1) l2 = Len(s2) ReDim d(l1, l2) For i = 0 … Read more
I was presented with this problem about a year ago when it came to looking up user entered information about a oil rig in a database of miscellaneous information. The goal was to do some sort of fuzzy string search that could identify the database entry with the most common elements. Part of the research … Read more
I implemented the standard Levenshtein edit distance function in TSQL with several optimizations that improves the speed over the other versions I’m aware of. In cases where the two strings have characters in common at their start (shared prefix), characters in common at their end (shared suffix), and when the strings are large and a … Read more