2

I am looking for an algorithm that checks if the Levenshtein distance between two strings $s_1$ and $s_2$ is less than a certain upper bound $B$. I know, there are plenty of algorithms for calculating the Levenshtein distance, but I expect a possible efficiency gain in scenarios where $B$ << $Levenshtein(s_1, s_2)$, because an algorithm not aiming to determine the actual distance, but just aiming to answer the question whether the distance is below $B$ or not, can terminate earlier, as soon as it becomes clear that the distance must surpass $B$.

For such an algorithm, I have the idea of using a recursive function which takes the parameters $s_1$, $s_2$ and $B$ checks if the first character of $s_1$ and $s_2$ are equal, and recursively calls itself (with a potentially decremented $B$ and accordingly adapted $s_1$ and $s_2$). The function would sort out all scenarios where $B$ falls below 0. (And of course, the function will make use of the trivial lower bounds of $Levenshtein(s_1, s_2)$.) If no recursion branch is left, the algorithm would terminate, asserting $Levenshtein(s_1, s_2) \geq B$.

But before reinventing the wheel, I wanted to ask if there are already existing solutions for my problem. I googled this and didn't find any, but maybe my google results were just polluted with the Levenshtein Distance Algorithms. If there is no such algorithm yet, is my approach a good idea, or are there more efficient ways which I oversee?

D.W.
  • 159,275
  • 20
  • 227
  • 470
  • (The insight exceedin a distance limit should be possible early on is vary valid.) I googled this and didn't find any try with Levenshtein|Левенште́йн - you should find, amongst others, Ukkonen. – greybeard Sep 28 '20 at 09:58
  • @greybeard Thanks! I think, Ukkonen was the keyword I was missing. – Jonathan Scholbach Sep 28 '20 at 10:14
  • (My suggestion isn't to add Ukkonen as a keyword, but look for his papers on the subject(/implementations) to appear as a check whether the search was formulated in a promising way.) – greybeard Sep 28 '20 at 10:32
  • @greybeard Yes I got it. Withour your comment, I just didn't even know of the existing of Ukkonen's algorithm. – Jonathan Scholbach Sep 28 '20 at 10:33
  • https://cs.stackexchange.com/q/27539/755 – D.W. Oct 01 '20 at 23:58
  • See this: https://stackoverflow.com/q/59686989/21508463 –  May 25 '23 at 08:08

1 Answers1

1

You may be interested in this implementation: https://github.com/fujimotos/polyleven

  • 2
    While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – xskxzr Sep 04 '21 at 05:27
  • 1
    Polyleven is a fast Levenshtein distance library for Python. – Pieter Hartel Sep 05 '21 at 07:45