Supervised Learning

Supervised learning is the most common type of machine learning. The idea is simple: you show the model examples with correct answers, and it learns to predict the answer for new, unseen examples.

Think of it like studying with a textbook that has an answer key. You practice on problems where you know the answer, then take a test with new problems.

Two Types of Supervised Learning

Classification — Predict a category (discrete output)

  • Is this email spam or not spam?
  • Is this tumor malignant or benign?
  • What digit (0–9) is in this image?
  • Regression — Predict a number (continuous output)

  • What will this house sell for?
  • How many units will we sell next quarter?
  • What temperature will it be tomorrow?
  • Classification Example: Iris Dataset

    The Iris dataset is the "hello world" of ML. It contains measurements of 150 iris flowers from 3 species. Let's build a classifier that predicts the species from the measurements.

    Run the code to see how a simple distance-based classifier works:

    Python
    Loading editor...
    Loading Python runtime...

    This is a nearest centroid classifier — one of the simplest classification algorithms. It computes the center of each class and assigns new points to the nearest center.

    Real-world classifiers (logistic regression, random forests, neural networks) are more sophisticated, but they follow the same principle: learn a decision boundary from labeled data, then use it to classify new points.

    The Supervised Learning Workflow

    1. Collect labeled data (inputs + correct outputs)

    2. Split data into training set and test set

    3. Train the model on the training set

    4. Evaluate on the test set (data the model hasn't seen)

    5. Iterate — try different models, features, or hyperparameters

    Step 4 is critical — we'll cover train/test splits and evaluation metrics in upcoming lessons.

    Key Takeaways

  • Supervised learning uses labeled examples to learn a mapping from inputs to outputs
  • Classification predicts categories, regression predicts numbers
  • Even simple algorithms (nearest centroid) can be effective
  • Always evaluate on data the model hasn't seen during training
  • What is Machine Learning?