Lecture 13 - 14 : AutoEncoders, VAEs, CVAEs, GANs, CGANs

\text{Intro to AutoEncoders and}

\textbf{Naresh Kumar Devulapally}

\text{CSE 4/573: Computer Vision and Image Processing}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Lecture 13: July 15, 2025}

\text{Variational AutoEncoders}

CVIP 2.0

\text{AutoEncoders}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

Neural Networks (NNs) are feature extractors
Feature Extraction using NNs recap
AutoEncoders
Transpose Convolutions
AutoEncoders PyTorch Coding Example
Inference in AutoEncoders
Variational AutoEncoders
Notation in Variational AutoEncoders (VAEs)
ELBO in VAEs
VAES PyTorch Coding Example

\( \text{Agenda of this Lecture:}\)

\text{July 10, 2025}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Feature Extraction using NNs}

Neural Networks have two components:

Feature Extractor Module
Task specific head

You can experiment with simple neural networks at Tensorflow Playground

Usually extracted features are of

lower dimension than data (x)

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Intro to GenAI: Data Distribution}

but, what does it mean when two images are closer to each other?

Closer in Low-Dimensional Feature Space

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{PyTorch Code sample - Feature Extraction - NNs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{PyTorch Code sample - Feature Extraction - NNs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{PyTorch Code sample - Feature Extraction - CNNs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{PyTorch Code sample - Feature Extraction - CNNs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{AutoEncoders}

Simple MLP AutoEncoder

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{AutoEncoders}

Simple MLP AutoEncoder

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{AutoEncoders}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{AutoEncoders}

Convolutional AutoEncoder

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Variational AutoEncoders}

P(x \mid z)

P(z \mid x)

Posterior

Generative Model

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Conditional Variational AutoEncoders}

P(x \mid z)

P(z \mid x)

Posterior

Generative Model

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Conditional Variational AutoEncoders}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{GANs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{GANs}

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Conditional GANs}

Training dynamics

1. Draw a batch of real images \( x_i \) with labels \( y_i \).

2. Sample noise \( z_i \) and random labels \( y_i \).

3. Update Discriminator on real \( (x_i,y_i) \) vs. fake \( (G(z_i,y_i),y_i) \).

4. Update Generator to fool \( D\bigl(G(z_i,y_i),y_i\bigr) \) into thinking those fakes are real.

\text{Naresh Kumar Devulapally}

\text{CSE 4/573: CVIP, Summer 2025}

\text{July 10, 2025}

\text{Conditional GANs}