gnn_tracking.metrics.cluster_metrics#

Metrics evaluating the quality of clustering/i.e., the usefulness of the algorithm for tracking.

Module Contents#

Classes#

ClusterMetricType

Function type that calculates a clustering metric.

TrackingMetrics

Initialize self. See help(type(self)) for accurate signature.

Functions#

tracking_metric_df(→ pandas.DataFrame)

Label clusters as double majority/perfect/LHC.

count_tracking_metrics(→ TrackingMetrics)

Calculate TrackingMetrics from cluster and hit information.

tracking_metrics(→ dict[float, TrackingMetrics])

Calculate 'custom' metrics for matching tracks and hits.

tracking_metrics_data(→ dict[float, TrackingMetrics])

Convenience function to apply tracking_metrics to a Data object.

tracking_metrics_vs_pt(→ pandas.DataFrame)

Calculate tracking metrics for pt slices.

tracking_metrics_vs_eta(→ pandas.DataFrame)

param h_dfs:

List of hit dataframes for different batches (see tracking_metrics_df)

flatten_track_metrics(→ dict[str, float])

Flatten the result of custom_metrics by using pt suffixes to arrive at a

count_hits_per_cluster(→ numpy.ndarray)

Count number of hits per cluster

hits_per_cluster_count_to_flat_dict(→ dict[str, float])

Turn result array from count_hits_per_cluster into a dictionary

_sklearn_signature_wrap(→ ClusterMetricType)

A decorator to make an sklearn cluster metric function accept/take the

Attributes#

_tracking_metrics_nan_results

common_metrics

class gnn_tracking.metrics.cluster_metrics.ClusterMetricType#

Bases: Protocol

Function type that calculates a clustering metric.

__call__(*, truth: numpy.ndarray, predicted: numpy.ndarray, pts: numpy.ndarray, reconstructable: numpy.ndarray, pt_thlds: list[float]) float | dict[str, float]#
class gnn_tracking.metrics.cluster_metrics.TrackingMetrics#

Bases: TypedDict

Initialize self. See help(type(self)) for accurate signature.

n_particles: int#
n_cleaned_clusters: int#
perfect: float#
double_majority: float#
lhc: float#
fake_perfect: float#
fake_double_majority: float#
fake_lhc: float#
gnn_tracking.metrics.cluster_metrics._tracking_metrics_nan_results: TrackingMetrics#
gnn_tracking.metrics.cluster_metrics.tracking_metric_df(h_df: pandas.DataFrame, predicted_count_thld=3) pandas.DataFrame#

Label clusters as double majority/perfect/LHC.

Parameters:
  • h_df – Hit information dataframe

  • predicted_count_thld – Number of hits a cluster must have to be considered a valid cluster

Returns:

cluster dataframe with columns such as “double_majority” etc.

gnn_tracking.metrics.cluster_metrics.count_tracking_metrics(c_df: pandas.DataFrame, h_df: pandas.DataFrame, c_mask: numpy.ndarray, h_mask: numpy.ndarray) TrackingMetrics#

Calculate TrackingMetrics from cluster and hit information.

Parameters:
  • c_df – Output dataframe from tracking_metric_dfs

  • h_df – Hit information dataframe

  • c_mask – Cluster mask

  • h_mask – Hit mask

Returns:

TrackingMetrics namedtuple.

gnn_tracking.metrics.cluster_metrics.tracking_metrics(*, truth: numpy.ndarray, predicted: numpy.ndarray, pts: numpy.ndarray, reconstructable: numpy.ndarray, eta: numpy.ndarray, pt_thlds: Iterable[float], predicted_count_thld=3, max_eta=4) dict[float, TrackingMetrics]#

Calculate ‘custom’ metrics for matching tracks and hits.

Parameters:
  • truth – Truth labels/PIDs for each hit

  • predicted – Predicted labels/cluster index for each hit. Negative labels are interpreted as noise (because this is how DBSCAN outputs it) and are ignored

  • pts – true pt value of particle belonging to each hit

  • reconstructable – Whether the hit belongs to a “reconstructable tracks” (this usually implies a cut on the number of layers that are being hit etc.)

  • eta – true pseudorapidity of particle belong to each hit

  • pt_thlds – pt thresholds to calculate the metrics for

  • predicted_count_thld – Minimal number of hits in a cluster for it to not be rejected.

  • max_eta – Maximum eta value to count

Returns:

See TrackingMetrics

gnn_tracking.metrics.cluster_metrics.tracking_metrics_data(data: torch_geometric.data.Data, labels, pt_thlds: Iterable[float], predicted_count_thld=3, max_eta=4) dict[float, TrackingMetrics]#

Convenience function to apply tracking_metrics to a Data object.

Parameters:
  • data – Data object

  • labels – Predicted labels/cluster index for each hit. Negative labels are treated as noise

  • pt_thlds – pt thresholds to calculate the metrics for

  • predicted_count_thld – Minimal number of hits in a cluster for it to not be rejected.

  • max_eta – Maximum eta value to count

gnn_tracking.metrics.cluster_metrics.tracking_metrics_vs_pt(h_dfs: list[pandas.DataFrame], c_dfs: list[pandas.DataFrame], pts: list[float], *, max_eta: float = 4.0) pandas.DataFrame#

Calculate tracking metrics for pt slices.

Parameters:
  • h_dfs – List of hit dataframes for different batches (see tracking_metrics_df)

  • c_dfs – List of cluster dataframes for different batches (see tracking_metrics_df)

  • pts – List of pt points to calculate the metrics for

  • max_eta – Maximum eta value to count

Returns:

Dataframe with tracking metrics for each pt slice

gnn_tracking.metrics.cluster_metrics.tracking_metrics_vs_eta(h_dfs: list[pandas.DataFrame], c_dfs: list[pandas.DataFrame], etas: list[float], pt_thld: float = 0.9) pandas.DataFrame#
Parameters:
  • h_dfs – List of hit dataframes for different batches (see tracking_metrics_df)

  • c_dfs – List of cluster dataframes for different batches (see tracking_metrics_df)

  • etas – Eta points to calculate metrics for

  • pt_thld

Returns:

Dataframe with tracking metrics for each pt slice

gnn_tracking.metrics.cluster_metrics.flatten_track_metrics(custom_metrics_result: dict[float, dict[str, float]]) dict[str, float]#

Flatten the result of custom_metrics by using pt suffixes to arrive at a flat dictionary, rather than a nested one.

gnn_tracking.metrics.cluster_metrics.count_hits_per_cluster(predicted: numpy.ndarray) numpy.ndarray#

Count number of hits per cluster

gnn_tracking.metrics.cluster_metrics.hits_per_cluster_count_to_flat_dict(counts: numpy.ndarray, min_max=10) dict[str, float]#

Turn result array from count_hits_per_cluster into a dictionary with cumulative counts.

Parameters:
  • counts – Result from count_hits_per_cluster

  • min_max – Pad the counts with zeros to at least this length

gnn_tracking.metrics.cluster_metrics._sklearn_signature_wrap(func: Callable) ClusterMetricType#

A decorator to make an sklearn cluster metric function accept/take the arguments from ClusterMetricType.

gnn_tracking.metrics.cluster_metrics.common_metrics: dict[str, ClusterMetricType]#