# Levenshtein distance

#### similarity measurement edit distance levenshtein distance Software Engineering Algorithms Get FREE domain for 1st year and build your brand new site

Levenshtein distance is a type of Edit distance which is a large class of distance metric of measuring the dissimilarity between two strings by computing a minimum number of operations (from a set of operations) used to convert one string to another string. It can be seen as a way of pairwise string alignment.

Levenshtein distance has a vast range of applications in computational linguistics, computer science, natural language processing, bioinformatics and many others.

### Computation

Given two strings S1 and S2, edit distance is the minimum cost associated with operations to convert string S1 to S2. Operations available in Levenshtein distance are:

• Insertion: Insert a character at a position (cost: 1)
• Deletion: Delete a character at any position (cost: 1)
• Replace: Replace a character at any position with another character (cost: 1)

Example:

Consider the following strings:


S2 = aabdec


Operations used to convert S1 to S2:

Operation 1: Delete d at position 3
Operation 2: Replace g at position 6 by c

Edit distance = 1 + 1 = 2

### Properties

Levenshtein distance with non-negative cost satisfies the axioms of a metric giving rise to a metric space of strings, when the following conditions are met:

• Every edit operation has positive cost
• For every operation, there is an inverse operation with equal cost

Properties of unit-cost Levenshtein distances include:

• LCS distance is bounded above by the sum of lengths of a pair of strings
• LCS distance is an upper bound on Levenshtein distance
• For strings of the same length, Hamming distance is an upper bound on Levenshtein distance

Regardless of cost/weights, the following property holds of all edit distances:

• When a and b share a common prefix, this prefix has no effect on the distance

### Algorithms

Algorithms that can be used to compute Levenshtein distance are:

• Wagner Fischer algorithm
• Ukkonen algorithm
• Landau Myers and Schmidt algorithm

### Applications

Computational Biology

• Quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T

Natural Language Processing

• Correction of spelling mistakes
• Correct OCR errors

General computation

• Measure the similarity between two data points #### OpenGenus Foundation

The official account of OpenGenus IQ backed by GitHub, DigitalOcean and Discourse