gnn_tracking.preprocessing.point_cloud_builder#
Build point clouds from the input data files.
Attributes#
Classes#
Protocol defining the interface for dataset readers |
|
Build point clouds, that is, read the input data files and convert them |
|
Build point clouds, that is, read the input data files and convert them |
|
Build point clouds, that is, read the input data files and convert them |
|
Build point clouds, that is, read the input data files and convert them |
Module Contents#
- gnn_tracking.preprocessing.point_cloud_builder.DEFAULT_FEATURES = ('r', 'phi', 'z', 'eta_rz', 'u', 'v', 'charge_frac', 'leta', 'lphi', 'lx', 'ly', 'lz', 'geta', 'gphi')#
- gnn_tracking.preprocessing.point_cloud_builder._DEFAULT_FEATURE_SCALE#
- gnn_tracking.preprocessing.point_cloud_builder.MD_FEATURES = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange']#
- gnn_tracking.preprocessing.point_cloud_builder.MD_COLS = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange', 'MD_layer']#
- gnn_tracking.preprocessing.point_cloud_builder.LS_COLS = ['LS_MD_idx0', 'LS_MD_idx1', 'LS_isInTrueTC', 'LS_TCidx', 'LS_sim_pt', 'LS_sim_eta']#
- class gnn_tracking.preprocessing.point_cloud_builder.DatasetReader#
Bases:
ProtocolProtocol defining the interface for dataset readers
- read_event(event_id: int) pandas.DataFrame#
Read a single event from the dataset
- class gnn_tracking.preprocessing.point_cloud_builder.BasePointCloudBuilder(*, outdir: str | pathlib.PurePath, indir: str | pathlib.PurePath, detector_config: pathlib.PurePath, n_sectors: int, redo: bool = True, pixel_only: bool = True, sector_di: float = 0.0001, sector_ds: float = 1.1, measurement_mode: bool = False, thld: float = 0.5, remove_noise: bool = False, write_output: bool = True, log_level=logging.INFO, collect_data: bool = False, feature_names: tuple = DEFAULT_FEATURES, feature_scale: tuple = _DEFAULT_FEATURE_SCALE, add_true_edges: bool = False, return_data: bool = False, data_type: str = Literal['TrackML', 'CMS_MC', 'MD', 'CMS_Run3'])#
Bases:
abc.ABCBuild point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).
- Parameters:
outdir – Directory for the output files
indir – Directory for the input files
detector_config – Path to the detector configuration file
n_sectors – Total number of sectors
redo – Re-compute the point cloud even if it is found
pixel_only – Construct tracks only from pixel layers
sector_di – The intercept offset for the extended sector
sector_ds – The slope offset for the extended sector
measurement_mode – Produce statistics about the sectorization
thld – Threshold pt for measurements
remove_noise – Remove hits with particle_id==0
write_output – Store the point clouds in a torch .pt file
log_level – Specify INFO (0) or DEBUG (>0)
collect_data – Collect data in memory
feature_names – Names of features to add
feature_scale – Scale of features
add_true_edges – Add true edges to the point cloud
- outdir#
- initial_outdir#
- indir#
- n_sectors#
- redo = True#
- pixel_only = True#
- sector_di = 0.0001#
- sector_ds = 1.1#
- measurement_mode = False#
- thld = 0.5#
- stats#
- remove_noise = False#
- measurements: list[dict[str, Any]] = []#
- write_output = True#
- feature_names = ['r', 'phi', 'z', 'eta_rz', 'u', 'v', 'charge_frac', 'leta', 'lphi', 'lx', 'ly', 'lz', 'geta', 'gphi']#
- feature_scale#
- return_data = False#
- data_type#
- prefixes: list[pathlib.Path] = []#
- exists: dict[str, bool]#
- outfiles#
- data_list: list[torch_geometric.data.Data] = []#
- logger#
- _collect_data = False#
- add_true_edges = False#
- _detector#
- static calc_eta(r: numpy.ndarray, z: numpy.ndarray) numpy.ndarray#
Compute pseudorapidity (spatial).
- abstractmethod read_event(event_id: int) pandas.DataFrame#
Read a single event from the dataset
- abstractmethod process_event(event_id: int)#
- append_features(hits: pandas.DataFrame) pandas.DataFrame#
Add additional features to the hits dataframe and return it.
- append_cell_features(hits: pandas.DataFrame, cells: pandas.DataFrame) pandas.DataFrame#
This method works for TrackML and CMS_MC but not MD
- static get_truth_edge_index(pids: numpy.ndarray) numpy.ndarray#
- save_output_file(name: str, hits: pandas.DataFrame)#
- to_pyg_data(hits: pandas.DataFrame) torch_geometric.data.Data#
Build the output data structure
- process(start: int | None = None, stop: int | None = None)#
Process input files from self.input_files and write output files to self.output_files
- Parameters:
start – index of first file to process
stop – index of last file to process (or None). Can be higher than total number of files.
Returns:
- collect_measurements(hits: pandas.DataFrame, event_id: int)#
- _get_edge_index(particle_id: numpy.ndarray) torch.Tensor#
- static append_n_layers_hit(hits: pandas.DataFrame)#
- class gnn_tracking.preprocessing.point_cloud_builder.TrackMLPointCloudBuilder(**kwargs)#
Bases:
BasePointCloudBuilderBuild point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).
- Parameters:
outdir – Directory for the output files
indir – Directory for the input files
detector_config – Path to the detector configuration file
n_sectors – Total number of sectors
redo – Re-compute the point cloud even if it is found
pixel_only – Construct tracks only from pixel layers
sector_di – The intercept offset for the extended sector
sector_ds – The slope offset for the extended sector
measurement_mode – Produce statistics about the sectorization
thld – Threshold pt for measurements
remove_noise – Remove hits with particle_id==0
write_output – Store the point clouds in a torch .pt file
log_level – Specify INFO (0) or DEBUG (>0)
collect_data – Collect data in memory
feature_names – Names of features to add
feature_scale – Scale of features
add_true_edges – Add true edges to the point cloud
- static load_trackml_event(base_path: pathlib.Path, event: str = 'event000000001', suffix: str = '.csv.gz')#
- read_event(event_id: int, ignore_loading_errors: bool = False) pandas.DataFrame#
Read a single event from the dataset.
- Parameters:
event_id – The ID of the event to read
- Returns:
DataFrame containing the hits data
- restrict_to_subdetectors(hits: pandas.DataFrame) pandas.DataFrame#
Rename (volume, layer) pairs with an integer label. If only pixel det, subset data
- sector_hits(hits: pandas.DataFrame, sector_id: int) pandas.DataFrame#
Break an event into (optionally) extended sectors.
- get_measurements() dict[str, float]#
- process_event(event_id: int, ignore_loading_errors=False)#
- _process_sectors(hits: pandas.DataFrame, evtid: int)#
- class gnn_tracking.preprocessing.point_cloud_builder.CMSPointCloudBuilder(**kwargs)#
Bases:
BasePointCloudBuilderBuild point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).
- Parameters:
outdir – Directory for the output files
indir – Directory for the input files
detector_config – Path to the detector configuration file
n_sectors – Total number of sectors
redo – Re-compute the point cloud even if it is found
pixel_only – Construct tracks only from pixel layers
sector_di – The intercept offset for the extended sector
sector_ds – The slope offset for the extended sector
measurement_mode – Produce statistics about the sectorization
thld – Threshold pt for measurements
remove_noise – Remove hits with particle_id==0
write_output – Store the point clouds in a torch .pt file
log_level – Specify INFO (0) or DEBUG (>0)
collect_data – Collect data in memory
feature_names – Names of features to add
feature_scale – Scale of features
add_true_edges – Add true edges to the point cloud
- process_event(evt_num)#
- read_event(evt_num)#
Read a single event from the dataset
- load_new_cms_mc_file(evt_num)#
- static assign_background_track_ids(hits)#
- class gnn_tracking.preprocessing.point_cloud_builder.MDPointCloudBuilder(**kwargs)#
Bases:
BasePointCloudBuilderBuild point clouds, that is, read the input data files and convert them to pytorch geometric data objects (without any edges yet).
- Parameters:
outdir – Directory for the output files
indir – Directory for the input files
detector_config – Path to the detector configuration file
n_sectors – Total number of sectors
redo – Re-compute the point cloud even if it is found
pixel_only – Construct tracks only from pixel layers
sector_di – The intercept offset for the extended sector
sector_ds – The slope offset for the extended sector
measurement_mode – Produce statistics about the sectorization
thld – Threshold pt for measurements
remove_noise – Remove hits with particle_id==0
write_output – Store the point clouds in a torch .pt file
log_level – Specify INFO (0) or DEBUG (>0)
collect_data – Collect data in memory
feature_names – Names of features to add
feature_scale – Scale of features
add_true_edges – Add true edges to the point cloud
- input_tree#
- feature_names = ['MD_0_r', 'MD_1_r', 'MD_0_z', 'MD_1_z', 'MD_eta', 'MD_phi', 'MD_dphichange']#
- feature_scale#
- read_event(event_id: int) pandas.DataFrame#
Read a single event from the dataset
- process_event(event_id: int)#