gnn_tracking.metrics.cluster_metrics#
Metrics evaluating the quality of clustering/i.e., the usefulness of the algorithm for tracking.
Attributes#
Classes#
Function type that calculates a clustering metric. |
|
Initialize self. See help(type(self)) for accurate signature. |
Functions#
|
Label clusters as double majority/perfect/LHC. |
|
Calculate TrackingMetrics from cluster and hit information. |
|
Calculate 'custom' metrics for matching tracks and hits. |
|
Convenience function to apply tracking_metrics to a Data object. |
|
Calculate tracking metrics for pt slices. |
|
|
|
Flatten the result of custom_metrics by using pt suffixes to arrive at a |
|
Count number of hits per cluster |
|
Turn result array from count_hits_per_cluster into a dictionary |
|
A decorator to make an sklearn cluster metric function accept/take the |
Module Contents#
- class gnn_tracking.metrics.cluster_metrics.ClusterMetricType#
Bases:
Protocol
Function type that calculates a clustering metric.
- __call__(*, truth: numpy.ndarray, predicted: numpy.ndarray, pts: numpy.ndarray, reconstructable: numpy.ndarray, pt_thlds: list[float]) float | dict[str, float] #
- class gnn_tracking.metrics.cluster_metrics.TrackingMetrics#
Bases:
TypedDict
Initialize self. See help(type(self)) for accurate signature.
- n_particles: int#
- n_cleaned_clusters: int#
- perfect: float#
- double_majority: float#
- lhc: float#
- fake_perfect: float#
- fake_double_majority: float#
- fake_lhc: float#
- gnn_tracking.metrics.cluster_metrics._tracking_metrics_nan_results: TrackingMetrics#
- gnn_tracking.metrics.cluster_metrics.tracking_metric_df(h_df: pandas.DataFrame, predicted_count_thld=3) pandas.DataFrame #
Label clusters as double majority/perfect/LHC.
- Parameters:
h_df – Hit information dataframe
predicted_count_thld – Number of hits a cluster must have to be considered a valid cluster
- Returns:
cluster dataframe with columns such as “double_majority” etc.
- gnn_tracking.metrics.cluster_metrics.count_tracking_metrics(c_df: pandas.DataFrame, h_df: pandas.DataFrame, c_mask: numpy.ndarray, h_mask: numpy.ndarray) TrackingMetrics #
Calculate TrackingMetrics from cluster and hit information.
- Parameters:
c_df – Output dataframe from tracking_metric_dfs
h_df – Hit information dataframe
c_mask – Cluster mask
h_mask – Hit mask
- Returns:
TrackingMetrics namedtuple.
- gnn_tracking.metrics.cluster_metrics.tracking_metrics(*, truth: numpy.ndarray, predicted: numpy.ndarray, pts: numpy.ndarray, reconstructable: numpy.ndarray, eta: numpy.ndarray, pt_thlds: Iterable[float], predicted_count_thld=3, max_eta=4) dict[float, TrackingMetrics] #
Calculate ‘custom’ metrics for matching tracks and hits.
- Parameters:
truth – Truth labels/PIDs for each hit
predicted – Predicted labels/cluster index for each hit. Negative labels are interpreted as noise (because this is how DBSCAN outputs it) and are ignored
pts – true pt value of particle belonging to each hit
reconstructable – Whether the hit belongs to a “reconstructable tracks” (this usually implies a cut on the number of layers that are being hit etc.)
eta – true pseudorapidity of particle belong to each hit
pt_thlds – pt thresholds to calculate the metrics for
predicted_count_thld – Minimal number of hits in a cluster for it to not be rejected.
max_eta – Maximum eta value to count
- Returns:
See TrackingMetrics
- gnn_tracking.metrics.cluster_metrics.tracking_metrics_data(data: torch_geometric.data.Data, labels, pt_thlds: Iterable[float], predicted_count_thld=3, max_eta=4) dict[float, TrackingMetrics] #
Convenience function to apply tracking_metrics to a Data object.
- Parameters:
data – Data object
labels – Predicted labels/cluster index for each hit. Negative labels are treated as noise
pt_thlds – pt thresholds to calculate the metrics for
predicted_count_thld – Minimal number of hits in a cluster for it to not be rejected.
max_eta – Maximum eta value to count
- gnn_tracking.metrics.cluster_metrics.tracking_metrics_vs_pt(h_dfs: list[pandas.DataFrame], c_dfs: list[pandas.DataFrame], pts: list[float], *, max_eta: float = 4.0) pandas.DataFrame #
Calculate tracking metrics for pt slices.
- Parameters:
h_dfs – List of hit dataframes for different batches (see tracking_metrics_df)
c_dfs – List of cluster dataframes for different batches (see tracking_metrics_df)
pts – List of pt points to calculate the metrics for
max_eta – Maximum eta value to count
- Returns:
Dataframe with tracking metrics for each pt slice
- gnn_tracking.metrics.cluster_metrics.tracking_metrics_vs_eta(h_dfs: list[pandas.DataFrame], c_dfs: list[pandas.DataFrame], etas: list[float], pt_thld: float = 0.9) pandas.DataFrame #
- Parameters:
h_dfs – List of hit dataframes for different batches (see tracking_metrics_df)
c_dfs – List of cluster dataframes for different batches (see tracking_metrics_df)
etas – Eta points to calculate metrics for
pt_thld
- Returns:
Dataframe with tracking metrics for each pt slice
- gnn_tracking.metrics.cluster_metrics.flatten_track_metrics(custom_metrics_result: dict[float, dict[str, float]]) dict[str, float] #
Flatten the result of custom_metrics by using pt suffixes to arrive at a flat dictionary, rather than a nested one.
- gnn_tracking.metrics.cluster_metrics.count_hits_per_cluster(predicted: numpy.ndarray) numpy.ndarray #
Count number of hits per cluster
- gnn_tracking.metrics.cluster_metrics.hits_per_cluster_count_to_flat_dict(counts: numpy.ndarray, min_max=10) dict[str, float] #
Turn result array from count_hits_per_cluster into a dictionary with cumulative counts.
- Parameters:
counts – Result from count_hits_per_cluster
min_max – Pad the counts with zeros to at least this length
- gnn_tracking.metrics.cluster_metrics._sklearn_signature_wrap(func: Callable) ClusterMetricType #
A decorator to make an sklearn cluster metric function accept/take the arguments from
ClusterMetricType
.
- gnn_tracking.metrics.cluster_metrics.common_metrics: dict[str, ClusterMetricType]#