In classification tasks in machine learning, evaluating model performance involves various metrics. The F1 Score is one of the most commonly used, especially when dealing with imbalanced classes. Here's an overview of the F1 Score and other key metrics:
Accuracy:
Formula: (TP + TN) / (TP + TN + FP + FN )
Usecase: when classes are balanced
Limitation: this will be misleading if the classes are imbalanced
Precision:
Forumula: Precision = TP / (TP + FP)
Use case: When false positives are costly (e.g., spam detection).
Recall (Sensitivity or True Positive Rate)
Formula is TP / (TP + FN)
Use case: When false negatives are costly (e.g., medical diagnosis).
F1 Score:
Formula: Recall = (Precision (.) Recall) / ( Precision + Recall)
Use cases: When you want a balance between precision and recall, especially for imbalanced datasets.
ROC-AUC (Receiver Operating Characteristic - Area Under Curve)
Use Case: Measures model’s ability to distinguish between classes across all thresholds.
Higher AUC → Better model.
Confusion Matrix
A confusion matrix is a performance measurement tool used in classification tasks in machine learning. It gives a detailed breakdown of how well a classification model is performing by showing actual vs. predicted class results.
For a binary classification, the confusion matrix looks like:
Predicted: Positive Predicted: Negative
Actual: Positive True Positive (TP) False Negative (FN)
Actual: Negative False Positive (FP) True Negative (TN)
True Positive (TP): Correctly predicted positive class
True Negative (TN): Correctly predicted negative class
False Positive (FP): Incorrectly predicted positive (Type I error)
False Negative (FN): Incorrectly predicted negative (Type II error)
Code for getting confusion matrix is given below
from sklearn.metrics import confusion_matrix, classification_report
y_true = [1, 0, 1, 1, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1]
# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
print("Confusion Matrix:\n", cm)
# Classification Report
print("\nClassification Report:\n", classification_report(y_true, y_pred))
When Is It Especially Beneficial?
When dealing with imbalanced datasets (e.g., fraud detection, medical diagnosis).
When overall accuracy is misleading — confusion matrix shows where the model fails.
In multi-class classification, the confusion matrix shows errors across all classes.
references:
OpenAI
No comments:
Post a Comment