ALICE (A Large Ion Collider Experiment) is a heavy-ion experiment studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). The ALICE DAQ (Data Acquisition System) is based on a large farm of commodity hardware consisting of more than 600 devices (Linux PCs, storage, network switches). The DAQ reads the data transferred from the detectors through 500 dedicated optical links at an aggregated and sustained rate of up to 10 Gigabytes per second and stores at up to 2.5 Gigabytes per second. The infoLogger is the log system which collects centrally the messages issued by the thousands of processes running on the DAQ machines. It allows to report errors on the fly, and to keep a trace of runtime execution for later investigation. More than 500000 messages are stored every day in a MySQL database, in a structured table keeping track for each message of 16 indexing fields (e.g. time, host, user, ...). The total amount of logs for 2012 exceeds 75GB of data and 150 million rows. We present in this paper the architecture and implementation of this distributed logging system, consisting of a client programming API, local data collector processes, a central server, and interactive human interfaces. We review the operational experience during the 2012 run, in particular the actions taken to ensure shifters receive manageable and relevant content from the main log stream. Finally, we present the performance of this log system, and future evolutions.
|Journal||Journal of Physics: Conference Series|
|Issue number||TRACK 1|
|Publication status||Published - Jan 1 2014|
|Event||20th International Conference on Computing in High Energy and Nuclear Physics, CHEP 2013 - Amsterdam, Netherlands|
Duration: Oct 14 2013 → Oct 18 2013
ASJC Scopus subject areas
- Physics and Astronomy(all)