SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases

László Dobos, Tamás Budavári, Nolan Li, Alexander S. Szalay, István Csabai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets.

Original languageEnglish
Title of host publicationScientific and Statistical Database Management - 24th International Conference, SSDBM 2012, Proceedings
Pages159-167
Number of pages9
DOIs
Publication statusPublished - Jul 9 2012
Event24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012 - Chania, Crete, Greece
Duration: Jun 25 2012Jun 27 2012

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7338 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other24th International Conference on Scientific and Statistical DatabaseManagement, SSDBM 2012
CountryGreece
CityChania, Crete
Period6/25/126/27/12

    Fingerprint

Keywords

  • astronomical catalogs
  • computational statistics
  • probabilistic join
  • query optimization and languages
  • workflow

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Dobos, L., Budavári, T., Li, N., Szalay, A. S., & Csabai, I. (2012). SkyQuery: An implementation of a parallel probabilistic join engine for cross-identification of multiple astronomical databases. In Scientific and Statistical Database Management - 24th International Conference, SSDBM 2012, Proceedings (pp. 159-167). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7338 LNCS). https://doi.org/10.1007/978-3-642-31235-9_10