The Compact Muon Solenoid (CMS) is a general-purpose detector at the Large Hadron Collider (LHC). It has a broad physics programme ranging from studying the Standard Model (including the Higgs boson) to searching for extra dimensions and particles that could make up dark matter. Although it has the same scientific goals as the ATLAS experiment, it uses different technical solutions and a different magnet-system design.
The CMS detector is built around a huge solenoid magnet. This takes the form of a cylindrical coil of superconducting cable that generates a field of 4 tesla, about 100,000 times the magnetic field of the Earth. The field is confined by a steel “yoke” that forms the bulk of the detector’s 14,000-tonne weight.
Follow WLCG procedures and practices
Check checksum in any file transfer
RAW data stored at two different T0
- 0.35 PB 2010
- 0.56 PB 2011
- 2.2 PB 2012
- 0.8 PB heavy-ion 2010-2013
Legacy reconstructed data (AOD):
- 60 TB 2010 data reprocessed in 2011 with CMSSW42 (no corresponding MC)
- 200 TB 2011 and 800 TB 2012 reprocessed in 2013 with CMSSW53 (with partial corresponding MC for 2011, and full MC for 2012)
Several reconstruction reprocessings
The current plan: keep a complete AOD reprocessing (in addition to 2×RAW)
- no reconstructed collision data have yet been deleted, but deletion campaigns are planned.
- most Run 2 analyses will use miniAOD’s which are significantly smaller in size
Open data: 28 TB of 2010 collision data released in 2014, and 130 TB of 2011 collision data to be released in 2015 available in CERN Open Data Portal (CODP)
Further public releases will follow.
Data provenance included in data files and further information collected in CMS Data Aggregation System (DAS)
Analysis approval procedure followed in CADI
Notes and drafts stored in CDS
Presentations in indico
User documentation in twiki serves mainly the current operation and usage
Basic documentation and examples provided for open data users in CODP
Set of benchmark analyses reproducing published results with open data in preparation, to be added to CODP
CMSSW open source and available in github and in CVFMS
Open data: VM image (CERNVM), which builds the appropriate environment from CVFMS, available in COPD
Main usage: analysis within the collaboration
Open data: education, outreach, analysis by external users
Main target: collaboration members
Open data: easy access to old data for collaboration members and external users
Data-taking and analysis is on-going, more than 400 publications by CMS
Open data: educational and scientific value, societal impact
Unique, only LHC can provide such data in any foreseeable time-scale
Storage within the current computing resources
Open data: storage for the 2010-2011 open data provided by CERN IT, further requests to be allocated through RRB
Bit preservation guaranteed in medium term within the CMS computing model and agreements with computing tiers, but the long-term preservation beyond the life-time of the experiment not yet addressed (storage, agreements, responsibilities),
Open data release has resulted in
- data and software access independent from the experiment specific resources
- a timely capture of the basic documentation, which, although limited and incomplete, makes data reuse in long term possible common solutions and services.
Competing with already scarce resources needed by an active experiment.
Knowledge preservation, lack of persistent information of the intermediate analysis steps to be addressed by the CERN Analysis Preservation framework (CAP)
- CMS has provided input for the data model and user interface design, and defining pipelines for automated ingestion from CMS services.
- The CAP use-cases are well acknowledged by CMS.
- CAP will be valuable tool to start data preservation while the analysis is active.
Long-term reusability: freezing environment (VM) vs evolving data: both approaches will be followed and CMS tries to address the complexity of the CMS data format
Impact of the open data release very positive
- well received by the public and the funding agencies
- no unexpected additional workload to the collaboration
- the data are in use.
Excellent collaboration with CERN services developing data preservation and open access services and with DASPOS
- Common projects are essential for long-term preservation
- Benefit from expertise in digital archiving and library services
- Fruitful discussion with other experiments.
Long-term vision and planning is difficult for ongoing experiments:
- DPHEP offers a unique viewpoint.
Next steps for CMS:
- stress-test CERN Open Data Portal with the new data release
- develop and deploy the CMS-specific interface to CERN Analysis Preservation framework