Naresh Kumar Devulapally

But why is Diffusion?

Naresh Kumar Devulapally

From basics of Distributions to Diffusion Models

CSE 573: CVIP

Naresh Kumar Devulapally

Introduction

Generative models have gotten very powerful recently

Stable Diffusion generated images

stabilityai/stable-diffusion-xl-base-1.0

stabilityai/stable-diffusion-3.5-medium

CSE 573: CVIP

Naresh Kumar Devulapally

What to expect from these lectures?

  • Introduction to Generative models (the intuition)
  • Understanding of the DDPM paper
  • Ability to read through diffusion-based papers after the lectures.

CSE 573: CVIP

Naresh Kumar Devulapally

What is a Function?

In mathematics, a function from a set \( X \) to a set \( Y \) assigns to each element of \( X \) exactly on element of \( Y \).

- Wikipedia

In simpler words, a function is something that takes an input and maps only one possible output for that given input.

Generally, a function is represented using \( f(\cdot) \), and a function that maps a point \(x \in X \) to \( y \in Y \) is represented as:

y = f(x)

CSE 573: CVIP

Naresh Kumar Devulapally

What is a Function?

y = f(x)

CSE 573: CVIP

Naresh Kumar Devulapally

Why do we need Function?

y = f(x)
(1, 2)
(1.5, 3)
(-0.5, -1)

Let's say you are given a bunch of data points:

CSE 573: CVIP

Naresh Kumar Devulapally

CSE 4/573 Computer Vision

Why do we need Function?

y = f(x)
(1, 2)
(1.5, 3)
(-0.5, -1)

Let's say you are given a bunch of data points:

y = 2 \times x

Mar 11, 2025

Naresh Kumar Devulapally

Why do we need Function?

y = f(x)
(1, 2)
(1.5, 3)
(-0.5, -1)
?
(0.5, ?)

CSE 573: CVIP

Naresh Kumar Devulapally

Why do we need Function?

y = f(x)

There are many ways to estimate a function \( y = f(x) \) based on data points. Discussion of such methods is outside the scope of this lecture.

However, we will touch upon basics of powerful function approximators, known as:

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Estimating Function?

y = f(x)

There are many ways to estimate a function \( y = f(x) \) based on data points. Discussion of such methods is outside the scope of this lecture.

However, we will touch upon basics of powerful function approximators, known as:

\text{Neural Networks}

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

\text{Neural Networks}

Neural Networks have two components:

  • Feature Extractor Module
  • Task specific head

You can experiment with simple neural networks at Tensorflow Playground

Mar 11, 2025

CSE 555: Pattern Recognition

Estimating a Function

Naresh Kumar Devulapally

Estimating a Function

\text{Neural Networks}

A simple example of a Neural Network

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Estimating a Function

\text{Discriminative v/s Generative models}

Mar 11, 2025

CSE 555: Pattern Recognition

y = f(x)

We have very powerful discriminator models:

  • E.g., Image classification models
x = f^{-1}(y)

What about generative models?

Given a label (e.g., "cat"), can we

generate a data point (image)?

Naresh Kumar Devulapally

From functions to data

\text{Generative Models}

Where does your sample come from?

Mar 11, 2025

CSE 555: Pattern Recognition

Let's shift our focus from labels to data points.

Naresh Kumar Devulapally

From functions to data

\text{Generative Models}

Where does your sample come from?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Data Distribution}

unknown

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Experiment:}

Eg: Flipping a fair coin

\text{Variables:}
  • Heads
  •  Tails
\text{Let X be a random variable}
X = \begin{cases} 1, \text{ (Heads)} \\ 0 \enspace \text{ (Tails)} \end{cases}

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{What is the behaviour of X?}

We can ask questions like:

  • How many times will \( X \) be equal to \(1\) if I flip a coin \( 1000 \) times?
  • Can we even expect some pattern?
    • Why? Why not?

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Types of Distributions}

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Gaussian Distribution}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

f(x) = \frac{1}{\sqrt{2 \pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2 \sigma^2}\right)

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Gaussian Distribution}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

change

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Gaussian Distribution}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

change

Naresh Kumar Devulapally

Whats a Probability Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Gaussian Distribution}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

\mathcal{N}(x \mid \mu, \sigma^2)

\( x \) follows a normal distribution with mean \( \mu \) and variance \( \sigma^2 \)

Naresh Kumar Devulapally

Gaussian Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Estimating paramaters of a distribution}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

Naresh Kumar Devulapally

Gaussian Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Useful notations}

Mean - \( \mu \)

Variance - \( \sigma^2 \)

x \sim \mathcal{N}(\mu, \sigma^2 I)
\mathcal{N}(x ; \mu, \sigma^2 I)
\mathcal{N}(x \mid \mu, \sigma^2 I)

All of these denote Gaussian distributions

A sample from the above distribution:

z = \mu + \sigma \cdot \varepsilon, \quad \text{where} \quad \varepsilon \sim \mathcal{N}(0,1)

Naresh Kumar Devulapally

Gaussian Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Useful properties of Gaussian Distribution}

Suppose \( x_1 \sim \mathcal{N}(\mu_1, \sigma_1^2 I) \) and \( x_2 \sim \mathcal{N}(\mu_2, \sigma_2^2 I) \).

What is the distribution of \( x_1 + x_2 \)?

Naresh Kumar Devulapally

Gaussian Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Useful properties of Gaussian Distribution}

Suppose \( x_1 \sim \mathcal{N}(\mu_1, \sigma_1^2 I) \) and \( x_2 \sim \mathcal{N}(\mu_2, \sigma_2^2 I) \).

What is the distribution of \( x_1 + x_2 \)?

x_1 + x_2 \sim \mathcal{N}(\mu_1 + \mu_2, (\sigma_1^2 + \sigma_2^2) I)

Suppose \( \boldsymbol{\varepsilon}_1, \boldsymbol{\varepsilon}_2 \sim \mathcal{N}(0, I) \), and

\( \boldsymbol{x}_1 = \sigma_1 \boldsymbol{\varepsilon}_1 \quad \text{and} \quad \boldsymbol{x}_2 = \sigma_2 \boldsymbol{\varepsilon}_2 \)

\( \boldsymbol{x}_1 + \boldsymbol{x}_2 \sim \mathcal{N}(0, (\sigma_1^2 + \sigma_2^2)I) \).

\( \boldsymbol{x}_1 + \boldsymbol{x}_2 = \sqrt{\sigma_1^2 + \sigma_2^2} \, \boldsymbol{\varepsilon} \)

sample

Naresh Kumar Devulapally

Gaussian Distribution?

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Useful properties of Gaussian Distribution}
\mathcal{N}(\mu_1, \sigma_1^2) \cdot \mathcal{N}(\mu_2, \sigma_2^2) \propto \mathcal{N}(\mu', \sigma'^2)
\mu' = \frac{\sigma_1^2 \mu_2 + \sigma_2^2 \mu_1}{\sigma_1^2 + \sigma_2^2}, \quad \sigma'^2 = \frac{\sigma_1^2 \sigma_2^2}{\sigma_1^2 + \sigma_2^2}

Product of two Gaussians is a Gaussian

Naresh Kumar Devulapally

Estimating data distribution

Mar 11, 2025

CSE 555: Pattern Recognition

P(x)

Naresh Kumar Devulapally

Estimating data distribution

Mar 11, 2025

CSE 555: Pattern Recognition

P(x)

Naresh Kumar Devulapally

Estimating data distribution

Mar 11, 2025

CSE 555: Pattern Recognition

P(x)

Boo!! (You know nothing about me)

Naresh Kumar Devulapally

Estimating data distribution

Mar 11, 2025

CSE 555: Pattern Recognition

P(x)
P(z)

our friend

Let's condition \( P(x) \) on \( P(z) \) and assume  \( x \) comes from \(z\). But Why?! and How?!

P(x \mid z)
P(z \mid x)

Posterior

Generative Model

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Denoising Diffusion}
\text{Probabilistic Models (DDPM)}

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{DDPM}
  • Forward Process
  • Reverse Process

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Why multiple steps in forward process?

Data reconstruction using VAEs

Concept of trajectories in Diffusion models

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

P(x \mid z)
P(z \mid x)

Posterior

Generative Model

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Forward Process}

In the forward process, the transition distribution \( q(x_t \mid x_{t-1}) \) is specifically predefined as:

q(x_t \mid x_{t-1}) = \mathcal{N} \left( x_t ; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I \right)

where \( { \beta_t \in (0, 1) }_{t=1}^T \) and \( \beta_1 \leq \beta_2 \leq ... \leq \beta_T \)

"Adding Gaussian Noise iteratively"

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Forward Process}

Choices of \( \beta_t \)

  • Learned
  • Constant
  • Linearly or quadratically increasing
  • Cosine function

Note that the reverse step \( p_\theta (x_{t-1} \mid x_t ) \) becomes Gaussian form only when \( \beta_t \) is small \( ( \beta_t << 1 ) \).

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Forward Process}

Choices of \( \beta_t \)

  • Learned
  • Constant
  • Linearly or quadratically increasing
  • Cosine function

Note that the reverse step \( p_\theta (x_{t-1} \mid x_t ) \) becomes Gaussian form only when \( \beta_t \) is small \( ( \beta_t << 1 ) \).

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
\text{Is } q(x_T \mid x_0) = \mathcal{N} \left( x \mid 0, I \right) ?

Yes! Under certain conditions that we are now going to define.

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
\text{Is } q(x_T \mid x_0) = \mathcal{N} \left( x \mid 0, I \right) ?

Yes! Under certain conditions that we are now going to define.

q(x_t \mid x_{t-1}) = \mathcal{N} \left( x_t ; \sqrt{1 - \beta_t} x_{t-1}, \beta_t I \right)

where \( { \beta_t \in (0, 1) }_{t=1}^T \) and \( \beta_1 \leq \beta_2 \leq ... \leq \beta_T \)

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
\text{Is } q(x_T \mid x_0) = \mathcal{N} \left( x \mid 0, I \right) ?

Let \( \alpha_t = 1 - \beta_t \)

q(x_t \mid x_{t-1}) = \mathcal{N} \left( x_t ; \sqrt{\alpha_t} x_{t-1}, (1-\alpha_t) I \right)

where \( { \alpha_t \in (0, 1) }_{t=1}^T \) and \( \alpha_1 \geq \alpha_2 \geq ... \geq \alpha_T \)

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)

Can we derive \( q ( x_t \mid x_0 ) \) from the sequence of \( q( x_t' \mid x_{t'-1} ) \) ?

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)
q(x_1 \mid x_0) = \mathcal{N}(x_1; \sqrt{\alpha_1} x_0, (1 - \alpha_1) \mathbf{I})
q(x_2 \mid x_1) = \mathcal{N}(x_2; \sqrt{\alpha_2} x_1, (1 - \alpha_2) \mathbf{I})

What is the distribution of \( q(x_2 \mid x_0) \) ?

\text{Reparametrization trick}
x_1 = \sqrt{\alpha_1} x_0 + \sqrt{1 - \alpha_1} \, \epsilon_0
x_2 = \sqrt{\alpha_2} x_1 + \sqrt{1 - \alpha_2} \, \epsilon_1
\epsilon_0, \epsilon_1 \sim \mathcal{N}(0, \mathbf{I})

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_2 \mid x_0)
x_2 = \sqrt{\alpha_2} \, \boxed{x_1} + \sqrt{1 - \alpha_2} \, \epsilon_1
= \sqrt{\alpha_2} \, \textcolor{blue}{\left( \sqrt{\alpha_1} x_0 + \sqrt{1 - \alpha_1} \epsilon_0 \right)} + \sqrt{1 - \alpha_2} \, \epsilon_1
= \sqrt{\alpha_2 \alpha_1} x_0 + \textcolor{red}{\sqrt{\alpha_2(1 - \alpha_1)} \, \epsilon_0} + \sqrt{1 - \alpha_2} \, \epsilon_1
= \sqrt{\alpha_2 \alpha_1} x_0 + \textcolor{red}{\sqrt{(1 - \alpha_2 \alpha_1)} \, \bar{\epsilon}_0}
\therefore q(x_2 \mid x_0) = \mathcal{N} \left( \sqrt{\alpha_2 \alpha_1} x_0, (1 - \alpha_2 \alpha_1) \mathbf{I} \right)

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)
x_t = \sqrt{\alpha_t} x_{t-1} + \sqrt{1 - \alpha_t} \, \epsilon_{t-1}
= \sqrt{\alpha_t} \left( \sqrt{\alpha_{t-1}} x_{t-2} + \sqrt{1 - \alpha_{t-1}} \, \epsilon_{t-2} \right) + \sqrt{1 - \alpha_t} \, \epsilon_{t-1}
= \sqrt{\alpha_t \alpha_{t-1}} x_{t-2} + \sqrt{\alpha_t (1 - \alpha_{t-1})} \, \epsilon_{t-2} + \sqrt{1 - \alpha_t} \, \epsilon_{t-1}
= \sqrt{\alpha_t \alpha_{t-1} x_{t-2}} + \sqrt{(1 - \alpha_t \alpha_{t-1})} \, \bar{\epsilon}_{t-2}
= \sqrt{\prod_{i=1}^{t} \alpha_i} \, x_0 + \sqrt{1 - \prod_{i=1}^{t} \alpha_i} \, \bar{\epsilon}_0
\dots

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)
q(x_t \mid x_{t-1}) = \mathcal{N} \left( \sqrt{\alpha_t} x_{t-1}, (1 - \alpha_t) \mathbf{I} \right)
q(x_t \mid x_0) = \mathcal{N} \left( \sqrt{\bar{\alpha}_t} x_0, (1 - \bar{\alpha}_t) \mathbf{I} \right)
\text{where } \bar{\alpha}_t = \prod_{i=1}^{t} \alpha_i

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)
q(x_t \mid x_0) = \mathcal{N} \left(x_t ; \sqrt{\bar{\alpha}_t} x_0, (1 - \bar{\alpha}_t) \mathbf{I} \right)
\text{where } \bar{\alpha}_t = \prod_{i=1}^{t} \alpha_i = \prod_{t=1}^{T} (1 - \beta_t)
\text{Note that: } \bar{\alpha}_1 > \bar{\alpha}_2, ... > \bar{\alpha}_T

When \( { \beta_t \in (0, 1)}_{t=1}^T \) what is:

\lim_{T \to \infty} \bar{\alpha}_T = \lim_{T \to \infty} \prod_{t=1}^{T} (1 - \beta_t) \,?

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{Convergence of Forward Process}
q(x_T \mid x_0)
q(x_T \mid x_0) = \mathcal{N} \left( x_T; \sqrt{\bar{\alpha}_T} x_0, (1 - \bar{\alpha}_T) \mathbf{I} \right)

As \( T \to \infty \), \( q(x_T \mid x_0) \) converges to the standard normal distribution:

q(x_T \mid x_0) = \mathcal{N} (x; \mathbf{0}, \mathbf{I})
\text{Forward Process converges to Standard Normal Distribution}

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

\text{What's next?}

DDPM generated images

\text{Reverse Diffusion Process}
\text{Implementation of Diffusion}
\text{Applications}

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusion Models

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Latent Diffusion Model

Mar 11, 2025

CSE 555: Pattern Recognition

Naresh Kumar Devulapally

Diffusers

Mar 11, 2025

CSE 555: Pattern Recognition

Diffusion-guest-lecture-1

By Naresh Kumar Devulapally

Diffusion-guest-lecture-1

  • 515