Contents
Introduction
The Levenshtein distance between two strings is the minimum number of edits (insertions, deletions, or single-character substitutions) needed to transform one string to the other. We derive a non-looping calculation of this distance.
A Looping Solution
First, a translation of the pseudo-code in the Wikipedia article into a looping function in J:
LevenshteinMatrix=: 4 : 0
d=. (1+(#x),#y)$0
for_i. i.1+#x do. d=. i (<i,0)}d end. NB. deletion
for_j. i.1+#y do. d=. j (<0,j)}d end. NB. insertion
for_j. i.#y do.
for_i. i.#x do.
if. (i{x) = j{y do.
d=. d (<1+i,j)}~ (<i,j){d
else.
m=. 1 + (<i,1+j){d NB. deletion
m=. m <. 1 + (<(1+i),j){d NB. insertion
m=. m <. 1 + (<i,j){d NB. substitution
d=. m (<1+i,j)}d
end. end. end.
)The function returns the entire matrix rather than just the Levenshtein distance; the actual distance is the number in the lower right corner. It reproduces the two test cases in the Wikipedia article:
'sitting' LevenshteinMatrix 'kitten' 0 1 2 3 4 5 6 1 1 2 3 4 5 6 2 2 1 2 3 4 5 3 3 2 1 2 3 4 4 4 3 2 1 2 3 5 5 4 3 2 2 3 6 6 5 4 3 3 2 7 7 6 5 4 4 3 'Sunday' LevenshteinMatrix 'Saturday' 0 1 2 3 4 5 6 7 8 1 0 1 2 3 4 5 6 7 2 1 1 2 2 3 4 5 6 3 2 2 2 3 3 4 5 6 4 3 3 3 3 4 3 4 5 5 4 3 4 4 4 4 3 4 6 5 4 4 5 5 5 4 3
A Non-Looping Solution
A non-looping version obtains through a series of algebraic manipulations, replacing the inner loop with an insert (f/) and the outer loop with an application of the power operator (^:). The result of this function is the transpose of the matrix produced by LevenshteinMatrix.
lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x'
lmx=: 4 : 0
b=. (_1+#y){x
d=. {:y
m=. (}.<.}:) d
y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y
)
LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[Testing against the looping version:
'sitting' (|:@LM -: LevenshteinMatrix) 'kitten' 1 'Sunday' (|:@LM -: LevenshteinMatrix) 'Saturday' 1
Version Producing Only The Scalar Distance
By HenryRich.
levdist=: 4 : 0 " 1
'a b'=. (/: #&>)x;y
z=. >: iz =. i.#b
for_j. a do.
z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z
end.
{:z
)
Collected Definitions
LevenshteinMatrix=: 4 : 0
d=. (1+(#x),#y)$0
for_i. i.1+#x do. d=. i (<i,0)}d end. NB. deletion
for_j. i.1+#y do. d=. j (<0,j)}d end. NB. insertion
for_j. i.#y do.
for_i. i.#x do.
if. (i{x) = j{y do.
d=. d (<1+i,j)}~ (<i,j){d
else.
m=. 1 + (<i,1+j){d NB. deletion
m=. m <. 1 + (<(1+i),j){d NB. insertion
m=. m <. 1 + (<i,j){d NB. substitution
d=. m (<1+i,j)}d
end. end. end.
)
lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x'
lmx=: 4 : 0
b=. (_1+#y){x
d=. {:y
m=. (}.<.}:) d
y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y
)
LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[
levdist=: 4 : 0 " 1
'a b'=. (/: #&>)x;y
z=. >: iz =. i.#b
for_j. a do.
z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z
end.
{:z
)
Contributed by RogerHui.
