Modern Convolutional Neural Networks#
Now that we understand the basics of wiring together CNNs, let’s take a tour of modern CNN architectures. This tour is, by necessity, incomplete, thanks to the plethora of exciting new designs being added. Their importance derives from the fact that not only can they be used directly for vision tasks, but they also serve as basic feature generators for more advanced tasks such as tracking (Zhang et al., 2021), segmentation (Long et al., 2015), object detection (Redmon and Farhadi, 2018), or style transformation (Gatys et al., 2016). In this chapter, most sections correspond to a significant CNN architecture that was at some point (or currently) the base model upon which many research projects and deployed systems were built. Each of these networks was briefly a dominant architecture and many were winners or runners-up in the ImageNet competition which has served as a barometer of progress on supervised learning in computer vision since 2010. It is only recently that Transformers have begun to displace CNNs, starting with Dosovitskiy et al. (2021) and followed by the Swin Transformer (Liu et al., 2021). We will cover this development later in the chapter on Attention Mechanisms and Transformers [Zhang et al., 2021].