gnn_tracking.postprocessing.dbscanscanner#

Classes#

ClusterScanner

Base class for cluster scanners. Use any of its subclasses.

DBSCANFastRescan

Class to perform DBSCAN clustering with fast rescanning.

OCScanResults

Restults of DBSCANHyperparamScanner and friends.

DBSCANHyperParamScanner

Scan for hyperparameters of DBSCAN. Use this scanner for validation.

DBSCANHyperParamScannerFixed

Scan grid for hyperparameters of DBSCAN. While DBSCANHyperParamScanner

DBSCANPerformanceDetails

Get information about detailed performance for fixed DBSCAN parameters.

Functions#

flatten_track_metrics(→ dict[str, float])

Flatten the result of custom_metrics by using pt suffixes to arrive at a

tracking_metric_df(→ pandas.DataFrame)

Label clusters as double majority/perfect/LHC.

tracking_metrics(→ dict[float, TrackingMetrics])

Calculate 'custom' metrics for matching tracks and hits.

add_key_prefix(→ dict[str, _P])

Return a copy of the dictionary with the prefix added to all keys.

dbscan(→ numpy.ndarray)

Convenience wrapper around sklearn's DBSCAN implementation.

Module Contents#

gnn_tracking.postprocessing.dbscanscanner.flatten_track_metrics(custom_metrics_result: dict[float, dict[str, float]]) dict[str, float]#

Flatten the result of custom_metrics by using pt suffixes to arrive at a flat dictionary, rather than a nested one.

gnn_tracking.postprocessing.dbscanscanner.tracking_metric_df(h_df: pandas.DataFrame, predicted_count_thld=3) pandas.DataFrame#

Label clusters as double majority/perfect/LHC.

Parameters:
  • h_df – Hit information dataframe

  • predicted_count_thld – Number of hits a cluster must have to be considered a valid cluster

Returns:

cluster dataframe with columns such as “double_majority” etc.

gnn_tracking.postprocessing.dbscanscanner.tracking_metrics(*, truth: numpy.ndarray, predicted: numpy.ndarray, pts: numpy.ndarray, reconstructable: numpy.ndarray, eta: numpy.ndarray, pt_thlds: Iterable[float], predicted_count_thld=3, max_eta=4) dict[float, TrackingMetrics]#

Calculate ‘custom’ metrics for matching tracks and hits.

Parameters:
  • truth – Truth labels/PIDs for each hit

  • predicted – Predicted labels/cluster index for each hit. Negative labels are interpreted as noise (because this is how DBSCAN outputs it) and are ignored

  • pts – true pt value of particle belonging to each hit

  • reconstructable – Whether the hit belongs to a “reconstructable tracks” (this usually implies a cut on the number of layers that are being hit etc.)

  • eta – true pseudorapidity of particle belong to each hit

  • pt_thlds – pt thresholds to calculate the metrics for

  • predicted_count_thld – Minimal number of hits in a cluster for it to not be rejected.

  • max_eta – Maximum eta value to count

Returns:

See TrackingMetrics

class gnn_tracking.postprocessing.dbscanscanner.ClusterScanner(*args, **kwargs)#

Bases: pytorch_lightning.core.mixins.hparams_mixin.HyperparametersMixin, abc.ABC

Base class for cluster scanners. Use any of its subclasses.

abstract __call__(data: torch_geometric.data.Data, out: dict[str, torch.Tensor], i_batch: int) None#
reset() None#
get_foms() dict[str, Any]#
class gnn_tracking.postprocessing.dbscanscanner.DBSCANFastRescan(x: numpy.ndarray, max_eps: float = 1.0, *, n_jobs: int | None = None)#

Class to perform DBSCAN clustering with fast rescanning.

Parameters:
  • x – Data to cluster

  • max_eps – Maximum epsilon to use during rescanning. Set to as low as possible to save time.

  • n_jobs – The number of parallel jobs to run.

_reset_graph(max_eps: float) None#

Set and store the radius_neighbors graph to use for clustering.

cluster(eps: float = 1.0, min_pts: int = 1)#

Perform clustering on given data with DBSCAN

Parameters:
  • eps – Epsilon to use for clustering

  • min_pts – Minimum number of points to form a cluster

gnn_tracking.postprocessing.dbscanscanner.add_key_prefix(dct: dict[str, _P], prefix: str = '') dict[str, _P]#

Return a copy of the dictionary with the prefix added to all keys.

gnn_tracking.postprocessing.dbscanscanner.dbscan(graphs: numpy.ndarray, eps=0.99, min_samples=1) numpy.ndarray#

Convenience wrapper around sklearn’s DBSCAN implementation.

class gnn_tracking.postprocessing.dbscanscanner.OCScanResults(df: pandas.DataFrame)#

Restults of DBSCANHyperparamScanner and friends.

property df: pandas.DataFrame#
property df_mean: pandas.DataFrame#

Mean and std grouped by hyperparameters.

get_foms(guide='double_majority_pt0.9') dict[str, float]#

Get figures of merit

get_n_best_trials(n: int, guide='double_majority_pt0.9') list[dict[str, float]]#
class gnn_tracking.postprocessing.dbscanscanner.DBSCANHyperParamScanner(*, eps_range=(0, 1), min_samples_range=(1, 4), n_trials=10, keep_best=0, n_jobs: int | None = None, guide: str = 'double_majority_pt0.9', pt_thlds=(0.0, 0.5, 0.9, 1.5), max_eta: float = 4.0)#

Bases: gnn_tracking.postprocessing.clusterscanner.ClusterScanner

Scan for hyperparameters of DBSCAN. Use this scanner for validation. Even with few trials, it will eventually apply finer samples to the best region, because it will keep the best trials from the previous epoch (make sure th choose non-zero kep_best).

Parameters:
  • eps_range – Range of DBSCAN radii to scan

  • min_samples_range – Range (INCLUSIVE!) of minimum number of samples for DBSCAN

  • n_trials – Total number of trials

  • keep_best – Keep this number of the best (eps, min_samples) pairs from the current epoch and make sure to scan over them again in the next epoch.

  • n_jobs – Number of jobs to use for parallelization

  • guide – Report tracking metrics for parameters that maximize this metric

  • pt_thlds – list of pT thresholds for the tracking metrics

  • max_eta – Max eta for tracking metrics

get_results() OCScanResults#
get_foms() dict[str, float]#
_get_best_trials() list[dict[str, float]]#
_reset_trials() None#
reset()#

Reset the results. Will be automatically called every time we run on a batch with i_batch == 0.

__call__(data: torch_geometric.data.Data, out: dict[str, torch.Tensor], i_batch: int, *, progress=False)#
class gnn_tracking.postprocessing.dbscanscanner.DBSCANHyperParamScannerFixed(trials: list[dict[str, float]], *, n_jobs: int | None = None, pt_thlds=(0.0, 0.5, 0.9, 1.5), max_eta: float = 4.0)#

Bases: DBSCANHyperParamScanner

Scan grid for hyperparameters of DBSCAN. While DBSCANHyperParamScanner is for use in validation steps, this is for use in detailed testing.

Parameters:
  • trials – List of trials to run

  • n_jobs – Number of jobs to use for parallelization

  • pt_thlds – list of pT thresholds for the tracking metrics

  • max_eta – Max eta for tracking metrics

_reset_trials() None#
class gnn_tracking.postprocessing.dbscanscanner.DBSCANPerformanceDetails(eps: float, min_samples: int)#

Bases: DBSCANHyperParamScanner

Get information about detailed performance for fixed DBSCAN parameters. See get_results for outputs.

Parameters:
  • eps – DBSCAN epsilon

  • min_samples – DBSCAN min_samples

__call__(data: torch_geometric.data.Data, out: dict[str, torch.Tensor], i_batch: int) None#
get_results() tuple[list[pandas.DataFrame], list[pandas.DataFrame]]#

Get results

Returns:

Tuple of (h_dfs, c_dfs), where h_dfs is a list of dataframes with information about all hits and c_dfs is a list of dataframes with information about all clusters. See tracking_metric_df for details about the information about both dataframes..

get_foms() dict[str, float]#