gnn_tracking.preprocessing.point_cloud_builder

Contents

gnn_tracking.preprocessing.point_cloud_builder#

Build point clouds from the input data files.

Attributes#

Classes#

DatasetReader

Protocol defining the interface for dataset readers

BasePointCloudBuilder

Build point clouds, that is, read the input data files and convert them

TrackMLPointCloudBuilder

Build point clouds, that is, read the input data files and convert them

CMSPointCloudBuilder

Build point clouds, that is, read the input data files and convert them

MDPointCloudBuilder

Build point clouds, that is, read the input data files and convert them

Module Contents#

gnn_tracking.preprocessing.point_cloud_builder.DEFAULT_FEATURES = ('r', 'phi', 'z', 'eta_rz', 'u', 'v', 'charge_frac', 'leta', 'lphi', 'lx', 'ly', 'lz', 'geta', 'gphi')#
gnn_tracking.preprocessing.point_cloud_builder._DEFAULT_FEATURE_SCALE#
gnn_tracking.preprocessing.point_cloud_builder.MD_FEATURES = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange']#
gnn_tracking.preprocessing.point_cloud_builder.MD_COLS = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange', 'MD_layer']#
gnn_tracking.preprocessing.point_cloud_builder.LS_COLS = ['LS_MD_idx0', 'LS_MD_idx1', 'LS_isInTrueTC', 'LS_TCidx', 'LS_sim_pt', 'LS_sim_eta']#
class gnn_tracking.preprocessing.point_cloud_builder.DatasetReader#

Bases: Protocol

Protocol defining the interface for dataset readers

read_event(event_id: int) pandas.DataFrame#

Read a single event from the dataset

class gnn_tracking.preprocessing.point_cloud_builder.BasePointCloudBuilder(*, outdir: str | pathlib.PurePath, indir: str | pathlib.PurePath, detector_config: pathlib.PurePath, n_sectors: int, redo: bool = True, pixel_only: bool = True, sector_di: float = 0.0001, sector_ds: float = 1.1, measurement_mode: bool = False, thld: float = 0.5, remove_noise: bool = False, write_output: bool = True, log_level=logging.INFO, collect_data: bool = False, feature_names: tuple = DEFAULT_FEATURES, feature_scale: tuple = _DEFAULT_FEATURE_SCALE, add_true_edges: bool = False, return_data: bool = False, data_type: str = Literal['TrackML', 'CMS_MC', 'MD', 'CMS_Run3'])#

Bases: abc.ABC

Build point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).

Parameters:
  • outdir – Directory for the output files

  • indir – Directory for the input files

  • detector_config – Path to the detector configuration file

  • n_sectors – Total number of sectors

  • redo – Re-compute the point cloud even if it is found

  • pixel_only – Construct tracks only from pixel layers

  • sector_di – The intercept offset for the extended sector

  • sector_ds – The slope offset for the extended sector

  • measurement_mode – Produce statistics about the sectorization

  • thld – Threshold pt for measurements

  • remove_noise – Remove hits with particle_id==0

  • write_output – Store the point clouds in a torch .pt file

  • log_level – Specify INFO (0) or DEBUG (>0)

  • collect_data – Collect data in memory

  • feature_names – Names of features to add

  • feature_scale – Scale of features

  • add_true_edges – Add true edges to the point cloud

outdir#
initial_outdir#
indir#
n_sectors#
redo = True#
pixel_only = True#
sector_di = 0.0001#
sector_ds = 1.1#
measurement_mode = False#
thld = 0.5#
stats#
remove_noise = False#
measurements: list[dict[str, Any]] = []#
write_output = True#
feature_names = ['r', 'phi', 'z', 'eta_rz', 'u', 'v', 'charge_frac', 'leta', 'lphi', 'lx', 'ly', 'lz', 'geta', 'gphi']#
feature_scale#
return_data = False#
data_type#
prefixes: list[pathlib.Path] = []#
exists: dict[str, bool]#
outfiles#
data_list: list[torch_geometric.data.Data] = []#
logger#
_collect_data = False#
add_true_edges = False#
_detector#
static calc_eta(r: numpy.ndarray, z: numpy.ndarray) numpy.ndarray#

Compute pseudorapidity (spatial).

abstractmethod read_event(event_id: int) pandas.DataFrame#

Read a single event from the dataset

abstractmethod process_event(event_id: int)#
append_features(hits: pandas.DataFrame) pandas.DataFrame#

Add additional features to the hits dataframe and return it.

append_cell_features(hits: pandas.DataFrame, cells: pandas.DataFrame) pandas.DataFrame#

This method works for TrackML and CMS_MC but not MD

static get_truth_edge_index(pids: numpy.ndarray) numpy.ndarray#
save_output_file(name: str, hits: pandas.DataFrame)#
to_pyg_data(hits: pandas.DataFrame) torch_geometric.data.Data#

Build the output data structure

process(start: int | None = None, stop: int | None = None)#

Process input files from self.input_files and write output files to self.output_files

Parameters:
  • start – index of first file to process

  • stop – index of last file to process (or None). Can be higher than total number of files.

Returns:

collect_measurements(hits: pandas.DataFrame, event_id: int)#
_get_edge_index(particle_id: numpy.ndarray) torch.Tensor#
static append_n_layers_hit(hits: pandas.DataFrame)#
class gnn_tracking.preprocessing.point_cloud_builder.TrackMLPointCloudBuilder(**kwargs)#

Bases: BasePointCloudBuilder

Build point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).

Parameters:
  • outdir – Directory for the output files

  • indir – Directory for the input files

  • detector_config – Path to the detector configuration file

  • n_sectors – Total number of sectors

  • redo – Re-compute the point cloud even if it is found

  • pixel_only – Construct tracks only from pixel layers

  • sector_di – The intercept offset for the extended sector

  • sector_ds – The slope offset for the extended sector

  • measurement_mode – Produce statistics about the sectorization

  • thld – Threshold pt for measurements

  • remove_noise – Remove hits with particle_id==0

  • write_output – Store the point clouds in a torch .pt file

  • log_level – Specify INFO (0) or DEBUG (>0)

  • collect_data – Collect data in memory

  • feature_names – Names of features to add

  • feature_scale – Scale of features

  • add_true_edges – Add true edges to the point cloud

static load_trackml_event(base_path: pathlib.Path, event: str = 'event000000001', suffix: str = '.csv.gz')#
read_event(event_id: int, ignore_loading_errors: bool = False) pandas.DataFrame#

Read a single event from the dataset.

Parameters:

event_id – The ID of the event to read

Returns:

DataFrame containing the hits data

restrict_to_subdetectors(hits: pandas.DataFrame) pandas.DataFrame#

Rename (volume, layer) pairs with an integer label. If only pixel det, subset data

sector_hits(hits: pandas.DataFrame, sector_id: int) pandas.DataFrame#

Break an event into (optionally) extended sectors.

get_measurements() dict[str, float]#
process_event(event_id: int, ignore_loading_errors=False)#
_process_sectors(hits: pandas.DataFrame, evtid: int)#
class gnn_tracking.preprocessing.point_cloud_builder.CMSPointCloudBuilder(**kwargs)#

Bases: BasePointCloudBuilder

Build point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).

Parameters:
  • outdir – Directory for the output files

  • indir – Directory for the input files

  • detector_config – Path to the detector configuration file

  • n_sectors – Total number of sectors

  • redo – Re-compute the point cloud even if it is found

  • pixel_only – Construct tracks only from pixel layers

  • sector_di – The intercept offset for the extended sector

  • sector_ds – The slope offset for the extended sector

  • measurement_mode – Produce statistics about the sectorization

  • thld – Threshold pt for measurements

  • remove_noise – Remove hits with particle_id==0

  • write_output – Store the point clouds in a torch .pt file

  • log_level – Specify INFO (0) or DEBUG (>0)

  • collect_data – Collect data in memory

  • feature_names – Names of features to add

  • feature_scale – Scale of features

  • add_true_edges – Add true edges to the point cloud

process_event(evt_num)#
read_event(evt_num)#

Read a single event from the dataset

load_new_cms_mc_file(evt_num)#
static assign_background_track_ids(hits)#
class gnn_tracking.preprocessing.point_cloud_builder.MDPointCloudBuilder(**kwargs)#

Bases: BasePointCloudBuilder

Build point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).

Parameters:
  • outdir – Directory for the output files

  • indir – Directory for the input files

  • detector_config – Path to the detector configuration file

  • n_sectors – Total number of sectors

  • redo – Re-compute the point cloud even if it is found

  • pixel_only – Construct tracks only from pixel layers

  • sector_di – The intercept offset for the extended sector

  • sector_ds – The slope offset for the extended sector

  • measurement_mode – Produce statistics about the sectorization

  • thld – Threshold pt for measurements

  • remove_noise – Remove hits with particle_id==0

  • write_output – Store the point clouds in a torch .pt file

  • log_level – Specify INFO (0) or DEBUG (>0)

  • collect_data – Collect data in memory

  • feature_names – Names of features to add

  • feature_scale – Scale of features

  • add_true_edges – Add true edges to the point cloud

input_tree#
feature_names = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange']#
feature_scale#
read_event(event_id: int) pandas.DataFrame#

Read a single event from the dataset

process_event(event_id: int)#