Is this how auto-correct works?

Suppose we are given two strings a a and b b . We want to devise a function f ( a , b ) f(a,b) that tells us how close the two strings are. A reasonable measurement would be to measure how many "edits" operations have to be made to one string in order to change it to the other. There are three possible "edit" operations:

Substitution: Replacing a single character from a a so that it matches b b costs 1 1 . If a = rot a=\text{rot} and b = dot b=\text{dot} . Then f ( a , b ) = 1 f(a,b)=1 .

Insertion: Inserting a single character also costs 1 1 . Ie, If a = girl a = \text{girl} and b = girls b=\text{girls} , then f ( a , b ) = 1 f(a,b)=1 .

Deletion: Deleting a single character also costs 1 1 . Ie. If a = hour a=\text{hour} and b = our b=\text{our} then f ( a , b ) = 1 f(a,b)=1 .

Given a a and b b , compute f ( a , b ) f(a,b) .


The answer is 125.

This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try refreshing the page, (b) enabling javascript if it is disabled on your browser and, finally, (c) loading the non-javascript version of this page . We're sorry about the hassle.

1 solution

This is called the edit distance or the Levenshtein Distance .

Algorithms for solving this problem are explained here .

Python code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
def levenshteinDistance(s1,s2):
    if len(s1) > len(s2):
        s1,s2 = s2,s1
    distances = range(len(s1) + 1)
    for index2,char2 in enumerate(s2):
        newDistances = [index2+1]
        for index1,char1 in enumerate(s1):
            if char1 == char2:
                newDistances.append(distances[index1])
            else:
                newDistances.append(1 + min((distances[index1],
                                             distances[index1+1],
                                             newDistances[-1])))
        distances = newDistances
    return distances[-1]



a='xmzjzohczxwssymjnlvbmogxvgtttcvkuajoamcaeaxwpoxlsayginhnpskufymnomfwtnnznrkndovjjdigahhpzihnmgvznnbx'
b='psakmqrhorcwpaocjnwppepnnhjuhqbyzcmvyyaoatbxgunkznwrwzkdhqfsxxlbqsbwpifsaplueyvbjcxbxddjnneioaybxcwcfbdzdlizxgzjnyervyhzsafmiiecabzxxtbqrmoqiifcxjjesx'

print(levenshteinDistance(a,b))

0 pending reports

×

Problem Loading...

Note Loading...

Set Loading...