SkyQuery

An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases

László Dobos, Tamás Budavári, Nolan Li, Alexander S. Szalay, I. Csabai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages159-167
Number of pages9
Volume7338 LNCS
DOIs
Publication statusPublished - 2012
Event24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012 - Chania, Crete, Greece
Duration: Jun 25 2012Jun 27 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7338 LNCS
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012
CountryGreece
CityChania, Crete
Period6/25/126/27/12

Fingerprint

Join
Engine
Servers
Engines
Work Flow
Server
Query
Spherical geometry
Spherical coordinates
Wavelength
Geometry
Large Data
Identification Problem
Relational Database
Indexing
Mirror
Sufficient
Scenarios
Necessary
Range of data

Keywords

  • astronomical catalogs
  • computational statistics
  • probabilistic join
  • query optimization and languages
  • workflow

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Dobos, L., Budavári, T., Li, N., Szalay, A. S., & Csabai, I. (2012). SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7338 LNCS, pp. 159-167). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7338 LNCS). https://doi.org/10.1007/978-3-642-31235-9_10

SkyQuery : An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. / Dobos, László; Budavári, Tamás; Li, Nolan; Szalay, Alexander S.; Csabai, I.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7338 LNCS 2012. p. 159-167 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7338 LNCS).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Dobos, L, Budavári, T, Li, N, Szalay, AS & Csabai, I 2012, SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 7338 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7338 LNCS, pp. 159-167, 24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012, Chania, Crete, Greece, 6/25/12. https://doi.org/10.1007/978-3-642-31235-9_10
Dobos L, Budavári T, Li N, Szalay AS, Csabai I. SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7338 LNCS. 2012. p. 159-167. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-31235-9_10
Dobos, László ; Budavári, Tamás ; Li, Nolan ; Szalay, Alexander S. ; Csabai, I. / SkyQuery : An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 7338 LNCS 2012. pp. 159-167 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{a7c9548c48ed478d8decb9ec7e7831cf,
title = "SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases",
abstract = "Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.",
keywords = "astronomical catalogs, computational statistics, probabilistic join, query optimization and languages, workflow",
author = "L{\'a}szl{\'o} Dobos and Tam{\'a}s Budav{\'a}ri and Nolan Li and Szalay, {Alexander S.} and I. Csabai",
year = "2012",
doi = "10.1007/978-3-642-31235-9_10",
language = "English",
isbn = "9783642312342",
volume = "7338 LNCS",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "159--167",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - SkyQuery

T2 - An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases

AU - Dobos, László

AU - Budavári, Tamás

AU - Li, Nolan

AU - Szalay, Alexander S.

AU - Csabai, I.

PY - 2012

Y1 - 2012

N2 - Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.

AB - Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.

KW - astronomical catalogs

KW - computational statistics

KW - probabilistic join

KW - query optimization and languages

KW - workflow

UR - http://www.scopus.com/inward/record.url?scp=84863433955&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84863433955&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-31235-9_10

DO - 10.1007/978-3-642-31235-9_10

M3 - Conference contribution

SN - 9783642312342

VL - 7338 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 159

EP - 167

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -