Unsupervised Learning

In supervised learning, you have labels. In unsupervised learning, you don't — you just have data, and the goal is to find hidden structure or patterns.

Common tasks:

  • ClusteringGroup similar data points together (customer segments, document topics)
  • Dimensionality reductionCompress high-dimensional data while preserving structure (PCA, t-SNE)
  • Anomaly detectionFind unusual data points (fraud detection, defect detection)
  • Clustering with K-Means

    You already know K-Means from the problem bank! Let's see it in action on a simple dataset.

    Run the code to cluster 2D points into 3 groups:

    Python
    Loading editor...
    Loading Python runtime...

    Notice that we never told the algorithm which points belong to which group — it figured out the structure on its own. That's the power of unsupervised learning.

    When to Use Unsupervised Learning

  • You have lots of data but no labels (labels are expensive to create)
  • You want to explore and understand your data before building a supervised model
  • The task is inherently about finding structure (market segmentation, topic modeling)
  • Supervised vs. Unsupervised: A Comparison

    SupervisedUnsupervised
    **Data**Labeled (X, y)Unlabeled (X only)
    **Goal**Predict y for new XFind patterns in X
    **Evaluation**Compare predictions to true labelsHarder — domain knowledge needed
    **Examples**Classification, regressionClustering, PCA, anomaly detection

    Key Takeaways

  • Unsupervised learning finds patterns in data without labels
  • Clustering groups similar points together — K-Means is the simplest approach
  • Unsupervised learning is useful for exploration, segmentation, and when labels aren't available
  • Evaluation is harder than supervised learning because there's no "correct answer" to compare against
  • Supervised Learning