5/6
Model Evaluation Metrics
Model Evaluation Metrics
"How good is my model?" depends on what you're measuring. Different tasks need different metrics.
Regression Metrics
MSE (Mean Squared Error)
— Average of squared differences. Penalizes large errors heavily.
MAE (Mean Absolute Error)
— Average of absolute differences. More robust to outliers.
R-squared
— Proportion of variance explained. 1.0 = perfect, 0.0 = no better than predicting the mean.
Python
Run
Loading editor...
Loading Python runtime...
Classification Metrics
Accuracy
— Fraction of correct predictions. Misleading with imbalanced classes.
Precision
— Of all positive predictions, how many were actually positive?
Recall
— Of all actual positives, how many did we predict?
F1 Score
— Harmonic mean of precision and recall. Balances both.
Python
Run
Loading editor...
Loading Python runtime...
Which Metric to Use?
Balanced classes
→ Accuracy is fine
Imbalanced classes
(e.g., fraud detection) → Use precision, recall, or F1
Cost of false positives is high
(e.g., spam filter) → Optimize precision
Cost of false negatives is high
(e.g., cancer detection) → Optimize recall
Key Takeaways
No single metric works for all problems
Understand what errors cost in your specific domain
Always look at multiple metrics, not just accuracy
Confusion matrices give the full picture for classification
Train/Test Split & Overfitting
Bias-Variance Tradeoff