Data Handling

This section refers to the bye_splits/data_handle directory. It includes the definition and functionality of event and geometry data, within appropriate classes, both inheriting form the same base class. Additional helper functions are also included.

../../_images/flowchart.png

Code dataflow structure. At an initial stage, event and geometry data are separately processed. A “cache” system is put in place to minimize expensive reads from file. The two data sources are combined to provide a basis for all algorithms, which can be exploited and extended by the user (“client”).

Event data

Implemented in data_handle/event.py, in the EventData class.

The core idea is to provide event-related data as fast as possible while minimizing the number of events to be loaded at any given time. For an algorithm where one knows in advance which events will be processes, one wants to load as many events as possible at once. However, for an interactive event display (see Data Plotting), one does not know in advance which specific events will be needed, and loading all might not be feasible, or fast. Thus, a simple system was set in place which detects whether a new event being requested has been requested before, reading it from the input file only if it is being requested for the first time. Events to load at start can be specifically mentioned via the default_events argument.

By inspecting the provide() and store() functions, one should get a good idea on how the overall structure works.

Note

Some EventData methods differ in their name by a single “s”. This singular/plural difference aims at promoting code reuse as much as possible, while providing a set of convenience functions for the user, which can process one or mor events using a (hopefully) clear syntax. See for instance provide_event() and provide_events().

Geometry data

Implemented in data_handle/geometry.py, in the GeometryData class.

It aims to mimic the official CMSSW HGCAL geometry, originally implemented in C++. Since it is conceived to be used for visualization only, only topological information is processed, while all (extremely complex) details on physics simulations are not present. Even the shapes are considered in a simplified manner; the result should not be seen as a replacement to the original geometry. The goal is to provide a reasonable hand-wavy visualization of what an algorithm is doing, potentiating faster algorithmic developments.

The geometry encodes the location of all TCs, silicon modules and scintillator tiles. It has been validated with respect to the CMSSW implementation. TCs have a fixed diamond shape for the silicon section, and a variable-sized annular-wedge shape for the scintillator section. This PR provides a good discussion on both shapes. This one instead discusses a compromise on the positioning of silicon TCs, given the fact that real TCs can have varying shapes.

Warning

The TPG was developed with more than one geometry definition in mind. This repository mimics the TPG geometry at the time of its development. It is thus possible that future geometries might introduce differences. As an example, this was the employed definition of the u and v module indices.

For the CE-E, the cells are drawn using the _display_silicon_tcs() method, while for the CE-H I used the _display_scintillator_tcs() method. The provide() and store() follows the same mechanism as in in Event data.

Other utilities

The InputData in data_handle/data_input.py class serves as a convenient data type, used only within data_handle/. The data_handle/data_process.py file instead stores functions that are used by the Tasks or Scripts.

The baseline_selection() provides an example on how to merge the generator information with cluster information, the latter referring to the clusters already formed by CMSSW, using its default reconstruction chain (mimicked by scripts/run_default_chain.py). For an example of how it is used, see the tasks/fill.py task. The function should be used for the first task of any “chain of tasks”, as executed by a script (see Scripts for details).

Note

The baseline_selection() function includes some “cluster-split-centered” options. Future studies are welcome to add similar functions for different needs.

An example on how to use the EventData class is provided in get_data_reco_chain_start(), which returns the three datasets of interest to the reconstruction chain: TCs, CMSSW-built clusters and generator level information. The function takes advantage of EventDataParticle, which is a helper function returning a EventaData object for a specific particle type and pile-up scenario.

Note

We invite the user to define its own version of get_data_reco_chain_start() depeding on their needs, namely the specific variables needed. Currently used variables represent a reasonable default.