Edit Distance—a.k.a “Lavenshtein Distance”–is the minimum number of edit operations required to transform one word into another. The allowable edit operations are letter insertion, letter deletion and letter substitution.
Using Dynamic Programming, we can compute the edit distance between two string sequences. But for that, we need to derive a recursive definition of Edit Distance. We denote the distance between two strings as D, which can be defined using a recurrence as follows.
Case 1 : Both and are empty strings, denoted as :
Case 2 : Either or is :
Case 3 : Both and are not :
Here , where is the last character of and contains rest of the characters. Same goes for . We define edit distance between and using a recurrence and in term of and .
can be defined as the minimum, or the least expensive one of the following three alternatives stated in the above equation.
- Substitution: If , then the overall distance is simply . Otherwise, we need a substitution operation that replaces with , and thus, the overall distance will be .
- Insertion: Second possibility is to convert to by inserting in . In this case, the distance will be . Here, +1 is the cost of the insert operation.
- Deletion: Last alternative is to convert to by deleting from that costs +1. Then the distance become .
As this is a ternary recurrence, it would result in an exponential run-time, which is quite impractical. However, using the dynamic programming with memoization, this recurrence can be solved using a 2D array. The code to solve this problem is outline below.
As shown in line 14, the distance
grid.[h,w] can be computed locally by taking the min of the three alternatives stated in the recurrence (computed in line 11,12, 13). By obtaining the locally optimum solutions, we eventually get the edit distance from
Complexity: Run-time complexity: . Lets denote the lengths of both strings as . Then, the complexity become . Space complexity is also same.
Complete source code is outlined in the next page.