Metrics and Scoring Rules
Contents
Metrics and Scoring Rules#
Metrics and scoring rules are quantitative measures used to evaluate the performance of a machine learning model. They help us to determine how well the model is able to make predictions, compared to the actual ground truth values.
Metrics Table#
A very comprehensive review can be found in torchmetrics: https://torchmetrics.readthedocs.io/en/latest/.
Problem Type |
Metric |
Description |
---|---|---|
Classification |
Accuracy |
The proportion of correct predictions made by a model, out of all the predictions. |
Classification |
Precision |
The proportion of true positive predictions (i.e., the model correctly predicts that a sample belongs to a positive class) out of all positive predictions made by the model. |
Classification |
Recall |
The proportion of true positive predictions made by the model out of all actual positive samples. |
Classification |
F1-Score |
The harmonic mean of precision and recall, which provides a single value that balances both metrics. |
Classification |
AUC-ROC |
The area under the Receiver Operating Characteristic curve, which measures the ability of a model to distinguish between positive and negative classes. |
Regression |
Mean Squared Error (MSE) |
The mean of the squared differences between the predicted values and the true values. |
Regression |
Mean Absolute Error (MAE) |
The mean of the absolute differences between the predicted values and the true values. |
Regression |
R-squared |
The proportion of variance in the dependent variable that is explained by the independent variable. |
Clustering |
Silhouette Score |
The mean similarity between a sample and all other samples in the same cluster, minus the mean similarity between a sample and all other samples in different clusters. |
Clustering |
Calinski-Harabasz Index |
A ratio of between-cluster variance to within-cluster variance, used to evaluate the quality of a clustering solution. |
Clustering |
Davies-Bouldin Index |
A measure of the average similarity between each cluster and its most similar cluster, used to evaluate the quality of a clustering solution. |