This problem can be solved using dynamic programming with memoization technique. In essence, it is about computing the Edit Distance, also known as, Levenshtein Distance between two given strings.
Definition
Edit Distance—a.k.a “Lavenshtein Distance”–is the minimum number of edit operations required to transform one word into another. The allowable edit operations are letter insertion, letter deletion and letter substitution.
Implementation
Using Dynamic Programming, we can compute the edit distance between two string sequences. But for that, we need to derive a recursive definition of Edit Distance. We denote the distance between two strings as D, which can be defined using a recurrence as follows.
Case 1 : Both and
are empty strings, denoted as
:
Case 2 : Either or
is
:
Case 3 : Both and
are not
:
Here , where
is the last character of
and
contains rest of the characters. Same goes for
. We define edit distance between
and
using a recurrence and in term of
and
.
can be defined as the minimum, or the least expensive one of the following three alternatives stated in the above equation.
- Substitution: If
, then the overall distance is simply
. Otherwise, we need a substitution operation that replaces
with
, and thus, the overall distance will be
.
- Insertion: Second possibility is to convert
to
by inserting
in
. In this case, the distance will be
. Here, +1 is the cost of the insert operation.
- Deletion: Last alternative is to convert
to
by deleting
from
that costs +1. Then the distance become
.
As this is a ternary recurrence, it would result in an exponential run-time, which is quite impractical. However, using the dynamic programming with memoization, this recurrence can be solved using a 2D array. The code to solve this problem is outline below.
let computeEditDistance (source:string,target:string) = | |
let height,width = (source.Length, target.Length) | |
let grid: int [,] = Array2D.zeroCreate<int> (height+1) (width+1) // 2D Array for memoization | |
for h = 0 to height do | |
for w = 0 to width do | |
grid.[h,w] <- | |
match h,w with | |
| h,0 -> h // case 1 and 2 | |
| 0, w -> w | |
| h, w -> | |
let s,t = source.[h-1],target.[w-1] | |
let substitution = grid.[h-1,w-1]+(if s = t then 0 else 1) | |
let insertion = grid.[h,w-1] + 1 | |
let deletion = grid.[h-1,w] + 1 | |
min (insertion, deletion, substitution) // case 3 | |
grid.[height,width] |
As shown in line 14, the distance grid.[h,w]
can be computed locally by taking the min of the three alternatives stated in the recurrence (computed in line 11,12, 13). By obtaining the locally optimum solutions, we eventually get the edit distance from grid.[s.length, t.length]
.
Complexity: Run-time complexity: . Lets denote the lengths of both strings as
. Then, the complexity become
. Space complexity is also same.
Complete source code is outlined in the next page.
2 thoughts on “SPOJ 6219. Edit Distance (EDIST) with F#”