Field Level Inference

A biased Review

[Video Credit: N-body simulation Francisco Villaescusa-Navarro]

IAIFI Fellow

Carolina Cuesta-Lazaro

 

What is field-level inference?

A digital twin of our Universe

Observed Galaxy Distribution

Simulated Galaxy Distribution

Field Level Inference

Forward Model

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

(= bye bye Cosmic Variance)

+
\Omega_m,
\sigma_8 ...
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

Why field-level inference?

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Optimal constraints

p(
)
|
\mathrm{Cosmology}

N-point functions

Counts-in-cell

Wavelets

Marked tpcfs

Voids

Do we really need to infer the ICs?

p(
)
|
\mathrm{Cosmology}
["On the Connection between Field-Level Inference and N-point Correlation Functions"  Schmidt]

M-th order. forward model: info on N <= M+1

Neural Posterior Estimation -> Optimal Summaries

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

["Optimal Neural Summarisation for Full-Field Weak Lensing Cosmological Implicit Inference" Lanzieri et al]
x
s = F_\eta(x)

High-Dimensional

Low-Dimensional

p(\theta|x) = p(\theta|s)

s is sufficient iif

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Neural Compression

I(s(x), \theta)

Maximise

Mutual Information

I(\theta, s(x)) = D_{\text{KL}}(p(\theta, s(x)) \parallel p(\theta)p(s(x)))
\theta, s(x) \, \, \mathrm{independent} \rightarrow p(\theta, s(x)) = p(\theta)p(s(x))
s(x)
\theta
I(s(x), \theta)
\theta, s(x)

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Neural Compression: MI

What field level inference isn't

p(
)
|
\mathrm{Cosmology}
S(
)
p(
)
|
\mathrm{Cosmology}

["A point cloud approach to generative modeling for galaxy surveys at the field level"

Cuesta-Lazaro and Mishra-Sharma
arXiv:2311.17141]

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Robustness?

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Is Field-Level Inference worth it?

p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})
p(\mathcal{\theta}|S(\delta_{\mathrm{Obs}}))

Optimal Summaries

FLI

\mathcal{O}(10)
\mathcal{O}(10-100)
\mathcal{O}(10^9)
\mathcal{O}(10^9)

Same pixel-level fidelity required

Number of simulations needed?

Training simulations are IID

Very high dimensional inference!

Low dimensional inference

+

Reconstructing ALL latent variables:

Dark Matter distribution

Entire formation history

Peculiar velocities

Predictive:

Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]

 

Constraining Inflation:

Inferring primordial non-gaussianity

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Why field-level inference?

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
"Bayesian physical reconstruction of initial conditions from large scale structure surveys" Jasche, Wandelt (2012)

Initials

Finals

Galaxies

Bayesian Origin Reconstruction from Galaxies (BORG)

"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

The Local Universe without CV

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al

The Hubble diagram

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"Field-level inference of cosmic shear with intrinsic alignments and baryons" Porqueres et al

FLI for Cosmic Shear

"Euclid: Field-level inference of primordial non-Gaussianity and cosmic initial conditions" Andrews et al

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

FLI for Primordial Non-Gaussianity

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

EFT at the Field Level

"How much information can be extracted from galaxy clustering at the field level?" Nguyen, Schmidt, Tucci, Reinecke, Kostić

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

The Beyond2pt Challenge

["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al arXiv:arXiv:2405.02252]

 

How well does the forward model fit the data?

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk."

\mathcal{O}(10^9)

 LCDM fit DESI

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Validating FLI: Testing the mean

"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq

Internal Consistency 

Initial P(k) consistency with LCDM

Cross-Validation 

Reconstructing lensing convergence from Planck

Validating FLI: Testing the error bars

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

\int_\Theta \hat{p}(\theta \mid x = x_0) d\theta = 1 - \alpha

Credible region (CR)

Not unique

High Posterior Density region (HPD)

Smallest "volume"

True value in CR with

1 - \alpha

probability

\theta^*
\mathcal{H}

Empirical Coverage Probability (ECP)

\mathrm{ECP} = \mathbb{E}_{p(x,\theta)} \left[ \mathbb{1} \left[ \theta \in \mathcal{H}_{\hat{p}(\theta|x)}(1-\alpha)\right] \right]
["Investigating the Impact of Model Misspecification in Neural Simulation-based Inference" 
Cannon et al arXiv:2209.01845 ]

Underconfident

Overconfident

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Ciela-institute/TARP

A) Gravity

B) Galaxy biasing

Sampling Methods

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

The FLI ingredients:

1) Forward Model

\Omega_m,
\sigma_8 ...
p(\delta_{\mathrm{ICs}}, \mathcal{\theta}|\delta_{\mathrm{Obs}})

C) Survey Systematics

2) Sampling Method

N-body

Particle Mesh

Effective Field Theories

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Gravity

Compute

- Galaxy formation

Compute

- Galaxy formation

Neural Network Corrections

Neural Network Emulators

k_\mathrm{max} \approx 0.5 h^{-1} \mathrm{Mpc}
\mathbf{F}_\theta(\mathbf{x},a) = \frac{3 \Omega_m}{2} \nabla \left[\phi^\mathrm{PM}(\mathbf{x}) + \phi^\mathrm{corr}_\theta(\mathbf{x}, a, \phi^\mathrm{PM}, \delta^\mathrm{PM}) \right]
["Hybrid Physical Neural ODEs for Fast N-body simulations" Lanzieri, Lanusse, Starck]
["Field Level Neural Network Emulator for Cosmological N-body Simulations" Jamieson et al]

Fast

Accurate

Scale Range

Efficient Sampling

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Galaxy Bias

Self consistent predictions 

Directly? linked to physical processes

Large Volumes

Large Volumes

MTNG ~ 500 Mpc/h

Robust

Clear assumptions

Large Scales

Galaxy formation?

["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, Mishra-Sharma, Oblujen, Toomey arXiv:2402.13310]

 

["Differentiable Cosmological Hydrodynamics for Field-Level Inference and High Dimensional Parameter Constraints" Horowitz, Lukic arXiv:2502.02294]

 

Effective Field Theories

Empirical

HOD/SHAM

Fast

Accurate?

Hydrodynamics

Fast

Clear assumptions

Galaxy formation?

[Video credit: Francisco Villaescusa-Navarro]

Gas density

Gas temperature

Subgrid model 1

Subgrid model 2

Subgrid model 3

Subgrid model 4

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Learn a representation for feedback

p(
, z)

Dark Matter

Baryonic fields

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Survey Forward Model: The Usual Suspects

Fiber Collisions

Survey Mask

Target Selection

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

(tracer dependent priorities)

Survey Forward Model: Know Unknowns

"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

The data likelihood

Gaussian

Poisson

"Impacts of the physical data model on the forward inference of initial conditions from biased tracers" 
Nguyen, Schmidt, Lavaux, Jasche
P(n_i | \lambda_i) = \frac{\lambda_i^{n_i} e^{-\lambda_i}}{n_i!}
P(n_i | \mu_i) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(n_i - \mu_i)^2}{2\sigma^2}\right)

More testing in realistic scenarios

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia
"Microcanonical Hamiltonian Monte Carlo" Robnik, De Luca, Silverstein, Seljak
"Field-Level Inference with Microcanonical Langevin Monte Carlo" 
Bayer, Seljak, Modi 

Sampling Methods

FLI Forward Models

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

BORG: 2LPT Gravity, Power-law galaxy bias

2) Likelihood is complex for realistic scenarios, but can get samples from simulator

1) Current analysis rely on simple forward models pushed to small scales (+ differentiable)

4) Not amortized -> Rigurous testing (coverage) becomes extremely hard

3) Either not sampling cosmology at all, or struggling to

GANS

Deep Belief Networks

2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014

2017

2019

2022

A folk music band of anthropomorphic autumn leaves playing bluegrass instruments

Contrastive Learning

2023

Meanwhile, on Earth...

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

p(\mathrm{World}|\mathrm{Prompt})
["Genie 2: A large-scale foundation model" Parker-Holder et al]
p(\mathrm{Drug}|\mathrm{Properties})
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]

Probabilistic ML has made high dimensional inference tractable

1024x1024xTime

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Learning to sample complex forward models

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks
Schallue, Eisenstein

1) Learning the posterior mean with deterministic models

2) Learning to sample with generative models

"Posterior Sampling of the Initial Conditions of the Universe from Non-linear Large Scale Structures using Score-Based Generative Models" Legin et al

True

Reconstructed

\delta_\mathrm{Obs}
\delta_\mathrm{ICs}
"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants
Cuesta-Lazaro, Bayer, Albergo et al 
NeurIPs ML4PS 2024 Spotlight talk

 

p(\delta_\mathrm{ICs}, \theta|\delta_\mathrm{Obs})

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Stochastic Interpolants

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Simulators at scale

"Bayesian Inference of Initial Conditions from Non-Linear Cosmic Structures using Field-Level Emulators"

Doeser et al

(Tested on matter, differences likely worse for galaxies)

Speed up perhaps not so impressive, but scaling with N-body resolution may be

"Field Level Neural Network Emulator for Cosmological N-body Simulations"

Jamieson et al

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Hydro Simulators at scale

"BaryonBridge: Interpolants models for fast hydrodynamical simulations"

Horowitz, Cuesta-Lazaro, Yehia (in prep)

Particle Mesh for Gravity

CAMELS Volumes

25 h^{-1} \mathrm{Mpc}

1000 boxes with varying cosmology and feedback models

Gas Properties

Current model optimised for Lyman Alpha forest

7 GPU minutes for a 50 Mpc simulation

130 million CPU core hours for TNG50

Density

Temperature

Galaxy Distribution

+ \mathcal{C}, \mathcal{A}
p(\mathrm{Baryons}|\mathrm{DM}, \mathcal{C}, \mathcal{A})

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

The Roadmap

2) Assess the robustness of field-level inference via parameter-masked mock challenges in realistic scenarios (example Beyond2pt)

3) Development of open source ecosystems for more plug and play models

 Field level analysis too complex for one group to develop a robust framework!

1) Need to develop better validation metrics (requires better validation suites)

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

FLI Benchmarks

Non-linearity

Simulated Volume

Resolution

"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Sampling Methods

Why do we need gradients to sample in high dimensions?

Image Credit: "Probabilistic Computation" Michael Betancourt
\mathbb{E}_{\pi}[f] = \int_{Q} \mathrm{d} q \, \pi(q) \, f(q)

Distance from the mode

https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html

Typical Set

-\frac{\partial \pi(q)}{\partial q}
H(q, p) = U(q) + K(p)

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

U(p)=− \log \pi(p)
K(q) = \frac{1}{2} p^T p
\frac{\partial H}{\partial q}

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Random Walks are isotropic, will bring you outside the typical set

Can we find a sampling algorithm that keeps samples in the typical set whilst moving far?

https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Find trajectories of similar Energy (p) that are far away

Hamiltonian Monte Carlo

https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng

Learning likelihoods at the field-level

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level"

Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]

Target Distribution

Simulated Galaxy 3d Map

Base Distribution

Prompt:

\Omega_m, \sigma_8

FLI-Sexten-2025

By carol cuesta

FLI-Sexten-2025

  • 46