Convolutional Neural Networks (CNNs)
Best for: Image and video data (2D or 3D grids of data). CNNs are the workhorses of computer vision. They are specifically designed to recognize patterns in spatial data, like the pixels in an image. Instead of looking at every pixel individually, CNNs look at images through a series of “filters” or “kernels.”How It Works: The Core Idea
Imagine you have a magnifying glass that can only see a small 3x3 pixel square. You slide this magnifying glass across an entire image, one patch at a time. This sliding filter is called a convolution. Each filter is designed to detect a specific feature, like a horizontal edge, a vertical edge, a specific color, or a curve.- Convolution Layers: These layers apply multiple filters to the input image, creating “feature maps” that highlight where those specific features were detected.
- Pooling Layers: After detecting features, these layers downsample the feature maps, making them smaller and more manageable. This process retains the most important information while reducing computational load. It’s like creating a lower-resolution summary of the important features.
- Fully-Connected Layers: Finally, after several convolution and pooling layers have extracted and summarized the key features, the output is fed into a standard fully-connected network (like we’ve already learned about) to perform the final classification (e.g., “this is a cat”).

