Model Evaluation Metrics

"How good is my model?" depends on what you're measuring. Different tasks need different metrics.

Regression Metrics

  • MSE (Mean Squared Error)Average of squared differences. Penalizes large errors heavily.
  • MAE (Mean Absolute Error)Average of absolute differences. More robust to outliers.
  • R-squaredProportion of variance explained. 1.0 = perfect, 0.0 = no better than predicting the mean.
  • Python
    Loading editor...
    Loading Python runtime...

    Classification Metrics

  • AccuracyFraction of correct predictions. Misleading with imbalanced classes.
  • PrecisionOf all positive predictions, how many were actually positive?
  • RecallOf all actual positives, how many did we predict?
  • F1 ScoreHarmonic mean of precision and recall. Balances both.
  • Python
    Loading editor...
    Loading Python runtime...

    Which Metric to Use?

  • Balanced classes → Accuracy is fine
  • Imbalanced classes (e.g., fraud detection) → Use precision, recall, or F1
  • Cost of false positives is high (e.g., spam filter) → Optimize precision
  • Cost of false negatives is high (e.g., cancer detection) → Optimize recall
  • Key Takeaways

  • No single metric works for all problems
  • Understand what errors cost in your specific domain
  • Always look at multiple metrics, not just accuracy
  • Confusion matrices give the full picture for classification
  • Train/Test Split & Overfitting