Supervised learning is the most common type of machine learning. The idea is simple: you show the model examples with correct answers, and it learns to predict the answer for new, unseen examples.
Think of it like studying with a textbook that has an answer key. You practice on problems where you know the answer, then take a test with new problems.
Classification — Predict a category (discrete output)
Regression — Predict a number (continuous output)
The Iris dataset is the "hello world" of ML. It contains measurements of 150 iris flowers from 3 species. Let's build a classifier that predicts the species from the measurements.
Run the code to see how a simple distance-based classifier works:
This is a nearest centroid classifier — one of the simplest classification algorithms. It computes the center of each class and assigns new points to the nearest center.
Real-world classifiers (logistic regression, random forests, neural networks) are more sophisticated, but they follow the same principle: learn a decision boundary from labeled data, then use it to classify new points.
1. Collect labeled data (inputs + correct outputs)
2. Split data into training set and test set
3. Train the model on the training set
4. Evaluate on the test set (data the model hasn't seen)
5. Iterate — try different models, features, or hyperparameters
Step 4 is critical — we'll cover train/test splits and evaluation metrics in upcoming lessons.