### Abstract

We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

Original language | English |
---|---|

Title of host publication | Applied Computing: Technological Challenges of the 1990's |

Publisher | Publ by ACM |

Pages | 137-143 |

Number of pages | 7 |

ISBN (Print) | 089791502X |

Publication status | Published - 1992 |

Event | Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92 - Kansas City, KS, USA Duration: Mar 1 1992 → Mar 3 1992 |

### Other

Other | Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92 |
---|---|

City | Kansas City, KS, USA |

Period | 3/1/92 → 3/3/92 |

### Fingerprint

### ASJC Scopus subject areas

- Engineering(all)

### Cite this

*Applied Computing: Technological Challenges of the 1990's*(pp. 137-143). Publ by ACM.

**Edit distance of run-length coded strings.** / Bunke, H.; Csirik, J.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Applied Computing: Technological Challenges of the 1990's.*Publ by ACM, pp. 137-143, Proceedings of the 1992 ACM/SIGAPP Symposium on Applied Computing - SAC '92, Kansas City, KS, USA, 3/1/92.

}

TY - GEN

T1 - Edit distance of run-length coded strings

AU - Bunke, H.

AU - Csirik, J.

PY - 1992

Y1 - 1992

N2 - We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

AB - We give an algorithm for measuring the similarity of run-length coded strings. In run-length coding, not all individual symbols in a string are listed. Instead, one run of identical consecutive symbols is coded by giving one representative symbol together with its multiplicity. If the strings under consideration consist of long runs of identical symbols, significant reductions in memory and access time can be achieved by run-length coding. Our algorithm determines the minimum cost sequence of edit operations needed to transform one string into another. It uses as basic data structure an edit matrix similar to the classical algorithm of Wagner and Fischer. However, depending on the particular pair of strings to be compared, only a part of this edit matrix usually needs to be computed. In the worst case, our algorithm has a time complexity of O(n·m), where n and m give the lengths of the strings to be compared. In the best case, the time complexity is O(k·l), where k and l are the numbers of runs of identical symbols in the two strings under comparison.

UR - http://www.scopus.com/inward/record.url?scp=0026970734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0026970734&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:0026970734

SN - 089791502X

SP - 137

EP - 143

BT - Applied Computing: Technological Challenges of the 1990's

PB - Publ by ACM

ER -