The analysis of double hashing

Leo J. Guibas, E. Szemerédi

Research output: Contribution to journalArticle

42 Citations (Scopus)

Abstract

In this paper we analyze the performance of double hashing, a well-known hashing algorithm in which we probe the hash table along arithmetic progressions where the initial element and the increment of the progression are chosen randomly and independently depending only on the key K of the search. We prove that double hashing is asymptotically equivalent to uniform probing for load factors α not exceeding a certain constant α0 = 0.31.... Uniform hashing refers to a technique which exhibits no clustering and is known to be optimal in a certain sense. Our proof method has a different flavor from those previously used in algorithmic analysis. We begin by showing that the tail of the hypergeometric distribution a fixed percentage away from the mean is exponentially small. We use this result to prove that random subsets of the finite ring of integers modulo m of cardinality am have always nearly the expected number of arithmetic progressions of length k, except with exponentially small probability. We then use this theorem to start up a process (called the extension process) of looking at snapshorts of the table as it fills up with double hashing. Between steps of the extension process we can show that the effect of clustering is negligible, and that we therefore never depart too far from the truly random situation.

Original languageEnglish
Pages (from-to)226-274
Number of pages49
JournalJournal of Computer and System Sciences
Volume16
Issue number2
DOIs
Publication statusPublished - 1978

Fingerprint

Flavors
Hashing
Arithmetic sequence
Table
Clustering
Hypergeometric Distribution
Finite Rings
Asymptotically equivalent
Start-up
Progression
Increment
Percentage
Modulo
Tail
Cardinality
Probe
Integer
Subset
Theorem

ASJC Scopus subject areas

  • Computational Theory and Mathematics

Cite this

The analysis of double hashing. / Guibas, Leo J.; Szemerédi, E.

In: Journal of Computer and System Sciences, Vol. 16, No. 2, 1978, p. 226-274.

Research output: Contribution to journalArticle

Guibas, Leo J. ; Szemerédi, E. / The analysis of double hashing. In: Journal of Computer and System Sciences. 1978 ; Vol. 16, No. 2. pp. 226-274.
@article{71d7d0b67fc74965840f7628e9b8c74c,
title = "The analysis of double hashing",
abstract = "In this paper we analyze the performance of double hashing, a well-known hashing algorithm in which we probe the hash table along arithmetic progressions where the initial element and the increment of the progression are chosen randomly and independently depending only on the key K of the search. We prove that double hashing is asymptotically equivalent to uniform probing for load factors α not exceeding a certain constant α0 = 0.31.... Uniform hashing refers to a technique which exhibits no clustering and is known to be optimal in a certain sense. Our proof method has a different flavor from those previously used in algorithmic analysis. We begin by showing that the tail of the hypergeometric distribution a fixed percentage away from the mean is exponentially small. We use this result to prove that random subsets of the finite ring of integers modulo m of cardinality am have always nearly the expected number of arithmetic progressions of length k, except with exponentially small probability. We then use this theorem to start up a process (called the extension process) of looking at snapshorts of the table as it fills up with double hashing. Between steps of the extension process we can show that the effect of clustering is negligible, and that we therefore never depart too far from the truly random situation.",
author = "Guibas, {Leo J.} and E. Szemer{\'e}di",
year = "1978",
doi = "10.1016/0022-0000(78)90046-6",
language = "English",
volume = "16",
pages = "226--274",
journal = "Journal of Computer and System Sciences",
issn = "0022-0000",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - The analysis of double hashing

AU - Guibas, Leo J.

AU - Szemerédi, E.

PY - 1978

Y1 - 1978

N2 - In this paper we analyze the performance of double hashing, a well-known hashing algorithm in which we probe the hash table along arithmetic progressions where the initial element and the increment of the progression are chosen randomly and independently depending only on the key K of the search. We prove that double hashing is asymptotically equivalent to uniform probing for load factors α not exceeding a certain constant α0 = 0.31.... Uniform hashing refers to a technique which exhibits no clustering and is known to be optimal in a certain sense. Our proof method has a different flavor from those previously used in algorithmic analysis. We begin by showing that the tail of the hypergeometric distribution a fixed percentage away from the mean is exponentially small. We use this result to prove that random subsets of the finite ring of integers modulo m of cardinality am have always nearly the expected number of arithmetic progressions of length k, except with exponentially small probability. We then use this theorem to start up a process (called the extension process) of looking at snapshorts of the table as it fills up with double hashing. Between steps of the extension process we can show that the effect of clustering is negligible, and that we therefore never depart too far from the truly random situation.

AB - In this paper we analyze the performance of double hashing, a well-known hashing algorithm in which we probe the hash table along arithmetic progressions where the initial element and the increment of the progression are chosen randomly and independently depending only on the key K of the search. We prove that double hashing is asymptotically equivalent to uniform probing for load factors α not exceeding a certain constant α0 = 0.31.... Uniform hashing refers to a technique which exhibits no clustering and is known to be optimal in a certain sense. Our proof method has a different flavor from those previously used in algorithmic analysis. We begin by showing that the tail of the hypergeometric distribution a fixed percentage away from the mean is exponentially small. We use this result to prove that random subsets of the finite ring of integers modulo m of cardinality am have always nearly the expected number of arithmetic progressions of length k, except with exponentially small probability. We then use this theorem to start up a process (called the extension process) of looking at snapshorts of the table as it fills up with double hashing. Between steps of the extension process we can show that the effect of clustering is negligible, and that we therefore never depart too far from the truly random situation.

UR - http://www.scopus.com/inward/record.url?scp=0003640232&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0003640232&partnerID=8YFLogxK

U2 - 10.1016/0022-0000(78)90046-6

DO - 10.1016/0022-0000(78)90046-6

M3 - Article

AN - SCOPUS:0003640232

VL - 16

SP - 226

EP - 274

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 2

ER -