The Compact Muon Solenoid (CMS) is a general-purpose detector at the Large Hadron Collider (LHC). It has a broad physics programme ranging from studying the Standard Model (including the Higgs boson) to searching for extra dimensions and particles that could make up dark matter. Although it has the same scientific goals as the ATLAS experiment, it uses different technical solutions and a different magnet-system design.

The CMS detector is built around a huge solenoid magnet. This takes the form of a cylindrical coil of superconducting cable that generates a field of 4 tesla, about 100,000 times the magnetic field of the Earth. The field is confined by a steel “yoke” that forms the bulk of the detector’s 14,000-tonne weight.

Bit Preservation: 

Follow WLCG procedures and practices

Check checksum in any file transfer


RAW data stored at two different T0

  • 0.35 PB 2010
  • 0.56 PB 2011
  • 2.2 PB 2012
  • 0.8 PB heavy-ion 2010-2013

Legacy reconstructed data (AOD):

  • 60 TB 2010 data reprocessed in 2011 with CMSSW42 (no corresponding MC)
  • 200 TB 2011 and 800 TB 2012 reprocessed in 2013 with CMSSW53 (with partial corresponding MC for 2011, and full MC for 2012)

Several reconstruction reprocessings

The current plan: keep a complete AOD reprocessing (in addition to 2×RAW)

  • no reconstructed collision data have yet been deleted, but deletion campaigns are planned.
  • most Run 2 analyses will use miniAOD’s which are significantly smaller in size

Open data: 28 TB of 2010 collision data released in 2014, and 130 TB of 2011 collision data to be released in 2015 available in CERN Open Data Portal (CODP)

Further public releases will follow.


Data provenance included in data files and further information collected in CMS Data Aggregation System (DAS)

Analysis approval procedure followed in CADI

Notes and drafts stored in CDS

Presentations in indico

User documentation in twiki serves mainly the current operation and usage

Basic documentation and examples provided for open data users in CODP

Set of benchmark analyses reproducing published results with open data in preparation, to be added to CODP


CMSSW open source and available in github and in CVFMS

Open data: VM image (CERNVM), which builds the appropriate environment from CVFMS, available in COPD

Use Cases: 

Main usage: analysis within the collaboration

Open data: education, outreach, analysis by external users


Main target: collaboration members

Open data: easy access to old data for collaboration members and external users


Data-taking and analysis is on-going, more than 400 publications by CMS

Open data: educational and scientific value, societal impact


Unique, only LHC can provide such data in any foreseeable time-scale


Storage within the current computing resources

Open data: storage for the 2010-2011 open data provided by CERN IT, further requests to be allocated through RRB


Bit preservation guaranteed in medium term within the CMS computing model and agreements with computing tiers, but the long-term preservation beyond the life-time of the experiment not yet addressed (storage, agreements, responsibilities),

Open data release has resulted in

  • data and software access independent from the  experiment specific resources
  • a timely capture of the basic documentation, which, although limited and incomplete, makes data reuse in long term possible common solutions and services.

Competing with already scarce resources needed by an active experiment.

Knowledge preservation, lack of persistent information of the intermediate  analysis steps to be addressed by the CERN Analysis Preservation framework (CAP)

  • CMS has provided input for the data model and user interface design, and defining pipelines for automated ingestion from CMS services.
  • The CAP use-cases are well acknowledged by CMS.
  • CAP will be valuable tool to start data preservation while the analysis is active.

Long-term reusability: freezing environment (VM) vs evolving data: both approaches will be followed and CMS tries to address the complexity of the CMS data format


Impact of the open data release very positive

  • well received by the public and the funding agencies
  • no unexpected additional workload to the collaboration
  • the data are in use.

Excellent collaboration with CERN services developing data preservation and open access services and with DASPOS

  • Common projects are essential for long-term preservation
  • Benefit from expertise in digital archiving and library services
  • Fruitful discussion with other experiments.

Long-term vision and planning is difficult for ongoing experiments:

  • DPHEP offers a unique viewpoint.

Next steps for CMS:

  • stress-test CERN Open Data Portal with the new data release
  • develop and deploy the CMS-specific interface to CERN Analysis Preservation framework
Host Lab: 

You are here