24. Convolutional Neural Networks
A Bit of History
Once upon a time...
Convolutional Neural Networks

- Document recognition์ Gradient-based learning ์ ์ฉ
- LeCun, Bottou, Bengio, Haffner 1998
The First Strong Result in Deep Learning

- Deep Convolutional Neural Networks๋ฅผ ํ์ฉํ ImageNet classification
- ์ผ๋ช "AlexNet"
- Alex Krizhevsky, Ilya Sutskever, Geoffrey E Hinton, 2012
- ILSVRC (ImageNet Large Scale Visual Recognition Challenge) 2012 ์ฐ์น
Famous Image Datasets: ImageNet
Famous Image Datasets: MNIST

- The MNIST database
- "Modified National Institute of Standards and Technology database"
- ๋ค์ํ Image processing systems ํ์ต์ ์ผ๋ฐ์ ์ผ๋ก ์ฌ์ฉ๋๋ ๋๊ท๋ชจ ์๊ธ์จ ์ซ์ Database
- 60,000๊ฐ์ Training images์ 10,000๊ฐ์ Testing images ํฌํจ
Famous Image Datasets: CIFAR 10 vs. CIFAR 100
CIFAR-10- 10๊ฐ Class, Class๋น 6,000์ฅ์ ์ด๋ฏธ์ง๋ก ๊ตฌ์ฑ๋ 60,000์ฅ์ 32x32 ์ปฌ๋ฌ ์ด๋ฏธ์ง
- 50,000๊ฐ์ Training images์ 10,000๊ฐ์ Test images ์กด์ฌ
CIFAR-100CIFAR-10๊ณผ ์ ์ฌํ๋, ๊ฐ๊ฐ 600์ฅ์ ์ด๋ฏธ์ง๋ฅผ ํฌํจํ๋ 100๊ฐ Class ๋ณด์- Class๋น 500๊ฐ์ Training images์ 100๊ฐ์ Testing images ์กด์ฌ
Fast-Forward to Today: ConvNets Are Everywhere
- Classification
- Retrieval
- Detection
- Segmentation
- Image Captioning
- Style Transfer
- Self-driving Cars
- Diffusion models
Convolutional Neural Networks
Convolutional Layer
Recap: Fully Connected Layer (Simple FF Networks)

- stretch to
Convolution Layer

preserve spatial structure





Convolutional Networks

A Closer Look at Spatial Dimensions

In Practice: Common to Zero Pad the Border
Caution

Summary: Convolutional Layer
- ์ ๋ ฅ์ด ๋ผ๊ณ ๊ฐ์
- Conv layer๋ 4๊ฐ์ง Hyperparameters ํ์:
- Number of filters
- The filter size
- The stride
- The zero padding
- ๋ค์์ ์ถ๋ ฅ ์์ฑ:
- Number of parameters: weights + biases
Pooling Layer

- Representations๋ฅผ ๋ ์๊ณ ๊ด๋ฆฌํ๊ธฐ ์ฝ๊ฒ ๋ง๋ฆ
- ๊ฐ Activation map์ ๋ํด ๋ ๋ฆฝ์ ์ผ๋ก ์๋
Max Pooling

Summary: Pooling Layer
- ์ ๋ ฅ์ด ๋ผ๊ณ ๊ฐ์
- Pooling Layer๋ 2๊ฐ์ง Hyperparameters ํ์:
- The spatial extent
- The stride
- ๋ค์์ ์ถ๋ ฅ ์์ฑ:
- Number of parameters: 0
Fully-Connected (FC) Layer
- ์ผ๋ฐ์ ์ธ Feedforward Neural Networks์ ๊ฐ์ด ์ ์ฒด Input volume์ ์ฐ๊ฒฐ๋ Neurons ํฌํจ
Summary
- ConvNets๋ CONV, POOL, FC layers๋ฅผ ์์
- ์ญ์ฌ์ ์ธ Architectures ํํ:
[(CONV-RELU)*N-POOL?]*M, (FC-RELU)*K, SOFTMAX- ์ ๋ณดํต 5๊น์ง, ์ ํผ,
- ๋ ์์ Filters์ ๋ ๊น์ Architectures๋ฅผ ํฅํ ์ถ์ธ
- POOL/FC layers๋ฅผ ์ ๊ฑฐํ๋ ์ถ์ธ (Just CONV)
Representative CNN Architectures
GoogleNet, VGGNet, ResNet, โฆ
ImageNet Large Scale Visual Recognition Challenge (ILSVRC) Winners

VGGNet

- ์์ Filters, ๋ ๊น์ Networks
- 8 layers (AlexNet) โ 16 - 19 layers (VGG16Net)
- 3x3 CONV stride 1, pad 1๊ณผ 2x2 MAX POOL stride 2๋ง ์ฌ์ฉ
ResNet

- Residual connections๋ฅผ ์ฌ์ฉํ๋ ๋งค์ฐ ๊น์ Networks
- ImageNet์ ์ํ 152-layer model - ILSVRCโ15 classification ์ฐ์น
- ILSVRCโ15 ๋ฐ COCOโ15์ ๋ชจ๋ Classification ๋ฐ Detection competitions ์๊ถ
CNN for Text Classification

- Sentence Classification์ ์ํ Convolutional Neural Networks
- Yoon Kim, EMNLP 2014

