Lycaeum — Education & Interview Prep for AI, ML & Quant

Model Evaluation Metrics

"How good is my model?" depends on what you're measuring. Different tasks need different metrics.

MSE (Mean Squared Error) — Average of squared differences. Penalizes large errors heavily.

MAE (Mean Absolute Error) — Average of absolute differences. More robust to outliers.

R-squared — Proportion of variance explained. 1.0 = perfect, 0.0 = no better than predicting the mean.

Python

Loading editor...

Loading Python runtime...

Accuracy — Fraction of correct predictions. Misleading with imbalanced classes.

Precision — Of all positive predictions, how many were actually positive?

Recall — Of all actual positives, how many did we predict?

F1 Score — Harmonic mean of precision and recall. Balances both.

Python

Loading editor...

Loading Python runtime...

Balanced classes → Accuracy is fine

Imbalanced classes (e.g., fraud detection) → Use precision, recall, or F1

Cost of false positives is high (e.g., spam filter) → Optimize precision

Cost of false negatives is high (e.g., cancer detection) → Optimize recall

No single metric works for all problems

Understand what errors cost in your specific domain

Always look at multiple metrics, not just accuracy

Confusion matrices give the full picture for classification