R3D3

A doubly opportunistic data structure for compressing and indexing massive data

Máté Nagy, J. Tapolcai, Gábor Rétvári

Research output: Contribution to journalArticle

Abstract

Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in information-theoretically minimum space. Yet, efficient data processing re-quires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

Original languageEnglish
Pages (from-to)58-66
Number of pages9
JournalInfocommunications Journal
Volume11
Issue number2
Publication statusPublished - Jan 1 2019

Fingerprint

Data structures
Redundancy
Entropy
Processing
Big data

Keywords

  • Big data
  • Compressed self-indexes
  • Packet forwarding
  • Succinct and compressed data structures

ASJC Scopus subject areas

  • Computer Science(all)
  • Electrical and Electronic Engineering

Cite this

R3D3 : A doubly opportunistic data structure for compressing and indexing massive data. / Nagy, Máté; Tapolcai, J.; Rétvári, Gábor.

In: Infocommunications Journal, Vol. 11, No. 2, 01.01.2019, p. 58-66.

Research output: Contribution to journalArticle

@article{4faf8b909ab147f89d2705e76c2ca4a4,
title = "R3D3: A doubly opportunistic data structure for compressing and indexing massive data",
abstract = "Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in information-theoretically minimum space. Yet, efficient data processing re-quires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.",
keywords = "Big data, Compressed self-indexes, Packet forwarding, Succinct and compressed data structures",
author = "M{\'a}t{\'e} Nagy and J. Tapolcai and G{\'a}bor R{\'e}tv{\'a}ri",
year = "2019",
month = "1",
day = "1",
language = "English",
volume = "11",
pages = "58--66",
journal = "Infocommunications Journal",
issn = "2061-2079",
publisher = "Scientific Association for Infocommunications",
number = "2",

}

TY - JOUR

T1 - R3D3

T2 - A doubly opportunistic data structure for compressing and indexing massive data

AU - Nagy, Máté

AU - Tapolcai, J.

AU - Rétvári, Gábor

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in information-theoretically minimum space. Yet, efficient data processing re-quires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

AB - Opportunistic data structures are used extensively in big data practice to break down the massive storage space requirements of processing large volumes of information. A data structure is called (singly) opportunistic if it takes advantage of the redundancy in the input in order to store it in information-theoretically minimum space. Yet, efficient data processing re-quires a separate index alongside the data, whose size often substantially exceeds that of the compressed information. In this paper, we introduce doubly opportunistic data structures to not only attain best possible compression on the input data but also on the index. We present R3D3 that encodes a bitvector of length n and Shannon entropy H0 to nH0 bits and the accompanying index to nH0(1/2 + O(log C/C)) bits, thus attaining provably minimum space (up to small error terms) on both the data and the index, and supports a rich set of queries to arbitrary position in the compressed bitvector in O(C) time when C = o(log n). Our R3D3 prototype attains several times space reduction beyond known compression techniques on a wide range of synthetic and real data sets, while it supports operations on the compressed data at comparable speed.

KW - Big data

KW - Compressed self-indexes

KW - Packet forwarding

KW - Succinct and compressed data structures

UR - http://www.scopus.com/inward/record.url?scp=85071002249&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071002249&partnerID=8YFLogxK

M3 - Article

VL - 11

SP - 58

EP - 66

JO - Infocommunications Journal

JF - Infocommunications Journal

SN - 2061-2079

IS - 2

ER -