Convolutional Neural Networks

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Naresh Kumar Devulapally

CSE 4/573: Computer Vision and Image Processing

naresh-ub.github.io

code at:

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Convolutional Neural Networks

Apr 29, 2025

What are

Image Filters?

Features help

Neural Networks

Can we LEARN Image filters?

Convolutional

Neural Networks

\text{What is convolution?}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Convolutional Neural Networks

Apr 29, 2025

Why

What are Neural Networks?

Convolution?

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolution

Recap

\frac{1}{9}

\times

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

\frac{1}{9}

\times

Different Filters = different Transformations

-1

-2

-1

\frac{1}{9}

\times

-1

\frac{1}{9}

\times

\text{Blur}

\text{Vertical}

\text{Edges}

\text{Horizontal}

\text{Edges}

-1

-2

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Parameters during Convolution

\text{Padding: 0, Stride: 1}

\text{Padding: 1, Stride: 1}

\text{Padding: 1, Stride: 2}

H_{\text{out}} = \left\lfloor \frac{H + 2P - K_h}{S} \right\rfloor + 1

W_{\text{out}} = \left\lfloor \frac{W + 2P - K_w}{S} \right\rfloor + 1

Y(i, j) = \sum_{m=0}^{K_h - 1} \sum_{n=0}^{K_w - 1} X(i \cdot m + S, j \cdot n + S) \cdot K(m, n)

\text{Stride: } S

\text{Padding: } P

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Parameters during Convolution

\text{Padding: 0, Stride: 1}

\text{Padding: 1, Stride: 1}

\text{Padding: 1, Stride: 2}

H_{\text{out}} = \left\lfloor \frac{H + 2P - K_h}{S} \right\rfloor + 1

W_{\text{out}} = \left\lfloor \frac{W + 2P - K_w}{S} \right\rfloor + 1

Y(i, j) = \sum_{m=0}^{K_h - 1} \sum_{n=0}^{K_w - 1} X(i \cdot m + S, j \cdot n + S) \cdot K(m, n)

\text{Stride: } S

\text{Padding: } P

\text{A quick coding sample}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

From Images to Classification Problem

\text{Let the input space be $\mathcal{X} \subseteq \mathbb{R}^d$}

\mathcal{D} = \{(x^{(i)}, y^{(i)})\}_{i=1}^N, \quad x^{(i)} \in \mathbb{R}^d, \; y^{(i)} \in \mathcal{Y}

\text{Dataset:}

\mathcal{L}_{\text{CE}}(\mathbf{y}, \hat{\mathbf{p}}) = -\sum_{c=1}^C y_c \log(\hat{p}_c)

\text{and the label space be $\mathcal{Y} = \{1, 2, \dots, C\}$ for a $C$-class classification problem}

\text{Our goal is to learn a function:}

f_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^C

\hat{y} = \argmax_{c \in \{1,\dots,C\}} \hat{z}_c

\text{such that:}

\text{Given the true label $y \in \{1, \dots, C\}$,}

\text{we encode it as a one-hot vector $y \in \{0,1\}^C$ such that $y_c = 1$ if $c = y$.}

\theta^* = \arg\min_{\theta} \; \mathcal{L}_{\text{avg}}(\theta)

\mathcal{L}_{\text{avg}} = \frac{1}{N} \sum_{i=1}^N \mathcal{L}(\hat{p}^{(i)}, y^{(i)}) = -\frac{1}{N} \sum_{i=1}^N \log \hat{p}^{(i)}_{y^{(i)}}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks

\text{Brief intro to Neural Networks}

\text{Dog}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks

\text{Brief intro to Neural Networks}

\text{Dog}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks - Importance of Features

\text{Prediction without new features}

\text{Prediction WITH new features}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks - Importance of Features

\text{Prediction without new features}

\text{Prediction WITH new features}

\text{Demo to show that features help prediction}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Why Pooling?}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Why Pooling?}

After convolution, the feature maps still contain a lot of redundant information, including slight variations or small shifts.

Max pooling helps by:

Keeping only the strongest activations (important features)
Making the network more robust to small translations or distortions
Reducing the spatial size of the feature maps, thus reducing computation and parameters.

O(i, j) = \max_{(u,v) \in p \times p \text{ window}} I(u,v)

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Exercise:}

\text{Implementing a simple CNN using PyTorch}