×

Search anything:

Levenshtein distance

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

Reading time: 15 minutes

Levenshtein distance is a type of Edit distance which is a large class of distance metric of measuring the dissimilarity between two strings by computing a minimum number of operations (from a set of operations) used to convert one string to another string. It can be seen as a way of pairwise string alignment.

Levenshtein distance has a vast range of applications in computational linguistics, computer science, natural language processing, bioinformatics and many others.

Computation

Given two strings S1 and S2, edit distance is the minimum cost associated with operations to convert string S1 to S2. Operations available in Levenshtein distance are:

  • Insertion: Insert a character at a position (cost: 1)
  • Deletion: Delete a character at any position (cost: 1)
  • Replace: Replace a character at any position with another character (cost: 1)

Example:

Consider the following strings:


S1 = aadbdeg
S2 = aabdec

Operations used to convert S1 to S2:

Operation 1: Delete d at position 3
Operation 2: Replace g at position 6 by c

Edit distance = 1 + 1 = 2

Properties

Levenshtein distance with non-negative cost satisfies the axioms of a metric giving rise to a metric space of strings, when the following conditions are met:

  • Every edit operation has positive cost
  • For every operation, there is an inverse operation with equal cost

Properties of unit-cost Levenshtein distances include:

  • LCS distance is bounded above by the sum of lengths of a pair of strings
  • LCS distance is an upper bound on Levenshtein distance
  • For strings of the same length, Hamming distance is an upper bound on Levenshtein distance

Regardless of cost/weights, the following property holds of all edit distances:

  • When a and b share a common prefix, this prefix has no effect on the distance

Algorithms

Algorithms that can be used to compute Levenshtein distance are:

  • Wagner Fischer algorithm
  • Ukkonen algorithm
  • Landau Myers and Schmidt algorithm

Applications

Computational Biology

  • Quantify the similarity of DNA sequences, which can be viewed as strings of the letters A, C, G and T

Natural Language Processing

  • Correction of spelling mistakes
  • Correct OCR errors

General computation

  • Measure the similarity between two data points
OpenGenus Tech Review Team

OpenGenus Tech Review Team

The official account of OpenGenus's Technical Review Team. This team review all technical articles and incorporates peer feedback. The team consist of experts in the leading domains of Computing.

Read More

Improved & Reviewed by:


Aditya Chatterjee Aditya Chatterjee
Levenshtein distance
Share this