Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

Naresh Kumar Devulapally

CSE 4/573: Computer Vision and Image Processing

naresh-ub.github.io

code at:

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Convolutional Neural Networks

Apr 29, 2025

What are

Image Filters?

Features help

Neural Networks

Can we LEARN Image filters?

Convolutional

Neural Networks

\text{What is convolution?}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Convolutional Neural Networks

Apr 29, 2025

Why

What are Neural Networks?

Convolution?

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolution

Recap

*
=
1
1
1
1
1
1
1
1
1
\frac{1}{9}
\times

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

=
1
1
1
1
1
1
1
1
1
\frac{1}{9}
\times

Different Filters = different Transformations

-1
0
1
-2
-1
0
2
0
1
\frac{1}{9}
\times
=
-1
0
1
0
0
2
1
\frac{1}{9}
\times
\text{Blur}
\text{Vertical}
\text{Edges}
\text{Horizontal}
\text{Edges}
=
-1
-2

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Parameters during Convolution

\text{Padding: 0, Stride: 1}
\text{Padding: 1, Stride: 1}
\text{Padding: 1, Stride: 2}
H_{\text{out}} = \left\lfloor \frac{H + 2P - K_h}{S} \right\rfloor + 1
W_{\text{out}} = \left\lfloor \frac{W + 2P - K_w}{S} \right\rfloor + 1
Y(i, j) = \sum_{m=0}^{K_h - 1} \sum_{n=0}^{K_w - 1} X(i \cdot m + S, j \cdot n + S) \cdot K(m, n)
\text{Stride: } S
\text{Padding: } P

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Parameters during Convolution

\text{Padding: 0, Stride: 1}
\text{Padding: 1, Stride: 1}
\text{Padding: 1, Stride: 2}
H_{\text{out}} = \left\lfloor \frac{H + 2P - K_h}{S} \right\rfloor + 1
W_{\text{out}} = \left\lfloor \frac{W + 2P - K_w}{S} \right\rfloor + 1
Y(i, j) = \sum_{m=0}^{K_h - 1} \sum_{n=0}^{K_w - 1} X(i \cdot m + S, j \cdot n + S) \cdot K(m, n)
\text{Stride: } S
\text{Padding: } P
\text{A quick coding sample}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

From Images to Classification Problem

\text{Let the input space be $\mathcal{X} \subseteq \mathbb{R}^d$}
\mathcal{D} = \{(x^{(i)}, y^{(i)})\}_{i=1}^N, \quad x^{(i)} \in \mathbb{R}^d, \; y^{(i)} \in \mathcal{Y}
\text{Dataset:}
\mathcal{L}_{\text{CE}}(\mathbf{y}, \hat{\mathbf{p}}) = -\sum_{c=1}^C y_c \log(\hat{p}_c)
\text{and the label space be $\mathcal{Y} = \{1, 2, \dots, C\}$ for a $C$-class classification problem}
\text{Our goal is to learn a function:}
f_\theta: \mathbb{R}^d \rightarrow \mathbb{R}^C
\hat{y} = \argmax_{c \in \{1,\dots,C\}} \hat{z}_c
\text{such that:}
\text{Given the true label $y \in \{1, \dots, C\}$,}
\text{we encode it as a one-hot vector $y \in \{0,1\}^C$ such that $y_c = 1$ if $c = y$.}
\theta^* = \arg\min_{\theta} \; \mathcal{L}_{\text{avg}}(\theta)
\mathcal{L}_{\text{avg}} = \frac{1}{N} \sum_{i=1}^N \mathcal{L}(\hat{p}^{(i)}, y^{(i)}) = -\frac{1}{N} \sum_{i=1}^N \log \hat{p}^{(i)}_{y^{(i)}}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks

\text{Brief intro to Neural Networks}
\text{Dog}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks

\text{Brief intro to Neural Networks}
\text{Dog}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks - Importance of Features

\text{Prediction without new features}
\text{Prediction WITH new features}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Neural Networks - Importance of Features

\text{Prediction without new features}
\text{Prediction WITH new features}
\text{Demo to show that features help prediction}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Why Pooling?}

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Why Pooling?}

After convolution, the feature maps still contain a lot of redundant information, including slight variations or small shifts.


Max pooling helps by:

  • Keeping only the strongest activations (important features)
  • Making the network more robust to small translations or distortions
  • Reducing the spatial size of the feature maps, thus reducing computation and parameters.
O(i, j) = \max_{(u,v) \in p \times p \text{ window}} I(u,v)

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Apr 29, 2025

Convolutional Neural Networks

\text{Exercise:}
\text{Implementing a simple CNN using PyTorch}