Production (skimming)

This section refers to the bye_splits/produce/ directory. The C++ code is kept for reference only, including the include/ and src/ directory; you should use the produce.py (single) executable.

To get instructions on how to run this step, please go to the Skimming section. In the following the produce.py script is described.

The code is implemented in Python to ease the production task for the user, skipping the need to compile the source code with the required ROOT packages, and avoiding the usage of two programming languages in the same context. This enables the user, for instance, to use the same config.yaml file for everything, without additional conversion steps. On the negative side, the ROOT.gInterpreter.Declare(...) syntax is employed for some RDataFrame operations which require C++ code, making it harder to debug.

Note

RDataFrame is preferred over uproot due to its loading speed and parallel processing. It is instead inferior from a convenience, readability an functionality point of view, which justifies uproot’s choice for all the Tasks. The conversion from one to the other requires the convertInt/Uint/Float(...) functions, otherwise uproot cannot read the skim output files.

The aim of the code is to reduce the size of the input files, easing the task of subsequent steps. This is the more important for files including pile-up. The selection cuts are applied both per-event and within an event. For instance, events without clusters are discarded, and clusters with negative pseudo-rapidity within an event are as well removed (in this case to focus on a single endcap).

Selection within an event is performed using the following characteristic (and convoluted) syntax:

df = df.Define("filtered_object", "object[mask]")

where object is a vector referring to a single event and containing some event-related properties (energy, momentum, …) and mask is a boolean vector with the same length as object which selects only the events passing a given condition. We mostly deal with vectors since an event can have multiple generated particles, clusters, and so on.

Warning

The choice of the used RDataFrame methods is often constrained by the version of RDataFrame that is available. Keep in mind that recent versions often include more intuitive methods to apply the same selections. Recent versions are also better suited for other functionalities, as for instance the inclusion of a progress bar, which is quite handy during the (somewhat lengthy) processing of hundreds of thousands of events.

The following selections are applied:

positive endcap, for simplicity, with tc_zside == 1 and cl3d_eta > 0
look only at odd layers, where TCs are located, using disconnectedTriggerLayers
remove various issues in the simulation step with genpart_gen != -1
select only particles with a specific PID with genpart_pid
select converted or unconverted photons, based on the reachedEE variable
perform generator matching with the calcDeltaR function, using deltarThreshold as maximum radial distance threshold
minimum TC energy with tc_mipPt, to mimic what the actual TPG does

Warning

The framework was never tested with the negative endcap. Small element coordinate’s misalignments might introduce unexpected effects, as well as unwanted dependences on the sign of the x, y, or z, coordinates. We recommend anyone testing the negative endcap to verify the outputs of every single task in the Tasks section.

Note

Vectors with the gen_ suffix refer to the generator step, while the genpart_ suffix refer to Geant4 (simulation). As an example, the Higgs boson is included in the former but not in the latter.

Note

The reachedEE selection can take three values:

0: converted photons (photons converting before reaching the surface of HGCAL)
1: photons that missed HGCAL
2: photons that hit HGCAL