ND-GIST

A novel method for disk-resident k-mer indexing

János Márk Szalai-Gindl, Attila Kiss, Gábor Halász, László Dobos, I. Csabai

Research output: Conference contribution

Abstract

Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.

Original languageEnglish
Title of host publicationNew Knowledge in Information Systems and Technologies - Volume 2
EditorsÁlvaro Rocha, Sandra Costanzo, Hojjat Adeli, Luís Paulo Reis
PublisherSpringer Verlag
Pages663-672
Number of pages10
ISBN (Print)9783030161835
DOIs
Publication statusPublished - jan. 1 2019
EventWorld Conference on Information Systems and Technologies, WorldCIST 2019 - Galicia, Spain
Duration: ápr. 16 2019ápr. 19 2019

Publication series

NameAdvances in Intelligent Systems and Computing
Volume931
ISSN (Print)2194-5357

Conference

ConferenceWorld Conference on Information Systems and Technologies, WorldCIST 2019
CountrySpain
CityGalicia
Period4/16/194/19/19

Fingerprint

Information management
DNA

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Szalai-Gindl, J. M., Kiss, A., Halász, G., Dobos, L., & Csabai, I. (2019). ND-GIST: A novel method for disk-resident k-mer indexing. In Á. Rocha, S. Costanzo, H. Adeli, & L. P. Reis (Eds.), New Knowledge in Information Systems and Technologies - Volume 2 (pp. 663-672). (Advances in Intelligent Systems and Computing; Vol. 931). Springer Verlag. https://doi.org/10.1007/978-3-030-16184-2_63

ND-GIST : A novel method for disk-resident k-mer indexing. / Szalai-Gindl, János Márk; Kiss, Attila; Halász, Gábor; Dobos, László; Csabai, I.

New Knowledge in Information Systems and Technologies - Volume 2. ed. / Álvaro Rocha; Sandra Costanzo; Hojjat Adeli; Luís Paulo Reis. Springer Verlag, 2019. p. 663-672 (Advances in Intelligent Systems and Computing; Vol. 931).

Research output: Conference contribution

Szalai-Gindl, JM, Kiss, A, Halász, G, Dobos, L & Csabai, I 2019, ND-GIST: A novel method for disk-resident k-mer indexing. in Á Rocha, S Costanzo, H Adeli & LP Reis (eds), New Knowledge in Information Systems and Technologies - Volume 2. Advances in Intelligent Systems and Computing, vol. 931, Springer Verlag, pp. 663-672, World Conference on Information Systems and Technologies, WorldCIST 2019, Galicia, Spain, 4/16/19. https://doi.org/10.1007/978-3-030-16184-2_63
Szalai-Gindl JM, Kiss A, Halász G, Dobos L, Csabai I. ND-GIST: A novel method for disk-resident k-mer indexing. In Rocha Á, Costanzo S, Adeli H, Reis LP, editors, New Knowledge in Information Systems and Technologies - Volume 2. Springer Verlag. 2019. p. 663-672. (Advances in Intelligent Systems and Computing). https://doi.org/10.1007/978-3-030-16184-2_63
Szalai-Gindl, János Márk ; Kiss, Attila ; Halász, Gábor ; Dobos, László ; Csabai, I. / ND-GIST : A novel method for disk-resident k-mer indexing. New Knowledge in Information Systems and Technologies - Volume 2. editor / Álvaro Rocha ; Sandra Costanzo ; Hojjat Adeli ; Luís Paulo Reis. Springer Verlag, 2019. pp. 663-672 (Advances in Intelligent Systems and Computing).
@inproceedings{5033edc00a6c471a865ac93c65a411a6,
title = "ND-GIST: A novel method for disk-resident k-mer indexing",
abstract = "Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.",
keywords = "Box query, Genome data, GiST, Indexing, Metagenomics, ND-tree, PostgreSQL",
author = "Szalai-Gindl, {J{\'a}nos M{\'a}rk} and Attila Kiss and G{\'a}bor Hal{\'a}sz and L{\'a}szl{\'o} Dobos and I. Csabai",
year = "2019",
month = "1",
day = "1",
doi = "10.1007/978-3-030-16184-2_63",
language = "English",
isbn = "9783030161835",
series = "Advances in Intelligent Systems and Computing",
publisher = "Springer Verlag",
pages = "663--672",
editor = "{\'A}lvaro Rocha and Sandra Costanzo and Hojjat Adeli and Reis, {Lu{\'i}s Paulo}",
booktitle = "New Knowledge in Information Systems and Technologies - Volume 2",

}

TY - GEN

T1 - ND-GIST

T2 - A novel method for disk-resident k-mer indexing

AU - Szalai-Gindl, János Márk

AU - Kiss, Attila

AU - Halász, Gábor

AU - Dobos, László

AU - Csabai, I.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.

AB - Several challenges are related to metagenomics, one of which is the data management. A related central concept is k-mer which means a possible subsequence of length k from a DNA (sub)sequence. In this work, the focus is on indexing k-mers and supporting box queries where a query string of length k might have multiple allowed nucleobases per position. A novel index structure: ND-GiST is introduced which has capability to handle box queries. Comparing it with full table scan and the traditional B-tree, the performance results of ND-GiST are encouraging.

KW - Box query

KW - Genome data

KW - GiST

KW - Indexing

KW - Metagenomics

KW - ND-tree

KW - PostgreSQL

UR - http://www.scopus.com/inward/record.url?scp=85065070174&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065070174&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-16184-2_63

DO - 10.1007/978-3-030-16184-2_63

M3 - Conference contribution

SN - 9783030161835

T3 - Advances in Intelligent Systems and Computing

SP - 663

EP - 672

BT - New Knowledge in Information Systems and Technologies - Volume 2

A2 - Rocha, Álvaro

A2 - Costanzo, Sandra

A2 - Adeli, Hojjat

A2 - Reis, Luís Paulo

PB - Springer Verlag

ER -