Edit distance of run-length coded strings

H. Bunke, J. Csirik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

Original languageEnglish
Title of host publicationApplied Computing: Technological Challenges of the 1990's
PublisherPubl by ACM
Pages137-143
Number of pages7
ISBN (Print)089791502X
Publication statusPublished - 1992
EventProceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92 - Kansas City, KS, USA
Duration: Mar 1 1992Mar 3 1992

Other

OtherProceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92
CityKansas City, KS, USA
Period3/1/923/3/92

Fingerprint

Data structures
Data storage equipment
Costs

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Bunke, H., & Csirik, J. (1992). Edit distance of run-length coded strings. In Applied Computing: Technological Challenges of the 1990's (pp. 137-143). Publ by ACM.

Edit distance of run-length coded strings. / Bunke, H.; Csirik, J.

Applied Computing: Technological Challenges of the 1990's. Publ by ACM, 1992. p. 137-143.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bunke, H & Csirik, J 1992, Edit distance of run-length coded strings. in Applied Computing: Technological Challenges of the 1990's. Publ by ACM, pp. 137-143, Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92, Kansas City, KS, USA, 3/1/92.
Bunke H, Csirik J. Edit distance of run-length coded strings. In Applied Computing: Technological Challenges of the 1990's. Publ by ACM. 1992. p. 137-143
Bunke, H. ; Csirik, J. / Edit distance of run-length coded strings. Applied Computing: Technological Challenges of the 1990's. Publ by ACM, 1992. pp. 137-143
@inproceedings{fbecacdb7bd648efa286cbd04d3a8c3f,
title = "Edit distance of run-length coded strings",
abstract = "We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.",
author = "H. Bunke and J. Csirik",
year = "1992",
language = "English",
isbn = "089791502X",
pages = "137--143",
booktitle = "Applied Computing: Technological Challenges of the 1990's",
publisher = "Publ by ACM",

}

TY - GEN

T1 - Edit distance of run-length coded strings

AU - Bunke, H.

AU - Csirik, J.

PY - 1992

Y1 - 1992

N2 - We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

AB - We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

UR - http://www.scopus.com/inward/record.url?scp=0026970734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0026970734&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0026970734

SN - 089791502X

SP - 137

EP - 143

BT - Applied Computing: Technological Challenges of the 1990's

PB - Publ by ACM

ER -