Field Level Inference
A biased Review
[Video Credit: N-body simulation Francisco Villaescusa-Navarro]
IAIFI Fellow
Carolina Cuesta-Lazaro

What is field-level inference?
A digital twin of our Universe

Observed Galaxy Distribution
Simulated Galaxy Distribution

Field Level Inference
Forward Model
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
(= bye bye Cosmic Variance)




Why field-level inference?
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Optimal constraints
N-point functions
Counts-in-cell
Wavelets
Marked tpcfs
Voids
Do we really need to infer the ICs?

["On the Connection between Field-Level Inference and N-point Correlation Functions" Schmidt]
M-th order. forward model: info on N <= M+1

Neural Posterior Estimation -> Optimal Summaries
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
["Optimal Neural Summarisation for Full-Field Weak Lensing Cosmological Implicit Inference" Lanzieri et al]


High-Dimensional
Low-Dimensional
s is sufficient iif
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Neural Compression
Maximise
Mutual Information
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Neural Compression: MI
What field level inference isn't

["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
arXiv:2311.17141]
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Robustness?
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Is Field-Level Inference worth it?
Optimal Summaries
FLI
Same pixel-level fidelity required
Number of simulations needed?
Training simulations are IID
Very high dimensional inference!
Low dimensional inference






Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive:
Cross-Correlation with other probes without Cosmic Variance

[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Why field-level inference?
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
"Bayesian physical reconstruction of initial conditions from large scale structure surveys" Jasche, Wandelt (2012)

Initials
Finals
Galaxies
Bayesian Origin Reconstruction from Galaxies (BORG)
"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
The Local Universe without CV




Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al
The Hubble diagram
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
"Field-level inference of cosmic shear with intrinsic alignments and baryons" Porqueres et al


FLI for Cosmic Shear
"Euclid: Field-level inference of primordial non-Gaussianity and cosmic initial conditions" Andrews et al

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
FLI for Primordial Non-Gaussianity
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
EFT at the Field Level
"How much information can be extracted from galaxy clustering at the field level?" Nguyen, Schmidt, Tucci, Reinecke, Kostić

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
The Beyond2pt Challenge
["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al arXiv:arXiv:2405.02252]


How well does the forward model fit the data?
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk."
LCDM fit DESI
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Validating FLI: Testing the mean


"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
Internal Consistency
Initial P(k) consistency with LCDM
Cross-Validation
Reconstructing lensing convergence from Planck
Validating FLI: Testing the error bars
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Credible region (CR)
Not unique
High Posterior Density region (HPD)
Smallest "volume"

True value in CR with
probability

Empirical Coverage Probability (ECP)

["Investigating the Impact of Model Misspecification in Neural Simulation-based Inference" Cannon et al arXiv:2209.01845 ]
Underconfident
Overconfident
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Ciela-institute/TARP
A) Gravity
B) Galaxy biasing
Sampling Methods
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
The FLI ingredients:

1) Forward Model




C) Survey Systematics
2) Sampling Method
N-body
Particle Mesh
Effective Field Theories
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Gravity
Compute
- Galaxy formation
Compute
- Galaxy formation
Neural Network Corrections
Neural Network Emulators
["Hybrid Physical Neural ODEs for Fast N-body simulations" Lanzieri, Lanusse, Starck]
["Field Level Neural Network Emulator for Cosmological N-body Simulations" Jamieson et al]
Fast
Accurate
Scale Range
Efficient Sampling
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Galaxy Bias
Self consistent predictions
Directly? linked to physical processes
Large Volumes
Large Volumes
MTNG ~ 500 Mpc/h
Robust
Clear assumptions
Large Scales
Galaxy formation?
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, Mishra-Sharma, Oblujen, Toomey arXiv:2402.13310]
["Differentiable Cosmological Hydrodynamics for Field-Level Inference and High Dimensional Parameter Constraints" Horowitz, Lukic arXiv:2502.02294]
Effective Field Theories
Empirical
HOD/SHAM
Fast
Accurate?
Hydrodynamics
Fast
Clear assumptions
Galaxy formation?
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Learn a representation for feedback


Dark Matter
Baryonic fields
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Survey Forward Model: The Usual Suspects

Fiber Collisions
Survey Mask
Target Selection
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
(tracer dependent priorities)



Survey Forward Model: Know Unknowns
"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
The data likelihood
Gaussian
Poisson
"Impacts of the physical data model on the forward inference of initial conditions from biased tracers" Nguyen, Schmidt, Lavaux, Jasche
More testing in realistic scenarios
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia
"Microcanonical Hamiltonian Monte Carlo" Robnik, De Luca, Silverstein, Seljak
"Field-Level Inference with Microcanonical Langevin Monte Carlo"
Bayer, Seljak, Modi
Sampling Methods
FLI Forward Models
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
BORG: 2LPT Gravity, Power-law galaxy bias
2) Likelihood is complex for realistic scenarios, but can get samples from simulator
1) Current analysis rely on simple forward models pushed to small scales (+ differentiable)
4) Not amortized -> Rigurous testing (coverage) becomes extremely hard
3) Either not sampling cosmology at all, or struggling to

GANS

Deep Belief Networks
2006

VAEs

Normalising Flows

BigGAN

Diffusion Models

2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
Meanwhile, on Earth...
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
["Genie 2: A large-scale foundation model" Parker-Holder et al]

["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Learning to sample complex forward models
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
"Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks"
Schallue, Eisenstein

1) Learning the posterior mean with deterministic models
2) Learning to sample with generative models

"Posterior Sampling of the Initial Conditions of the Universe from Non-linear Large Scale Structures using Score-Based Generative Models" Legin et al

True
Reconstructed

"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al
NeurIPs ML4PS 2024 Spotlight talk
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Stochastic Interpolants
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Simulators at scale
"Bayesian Inference of Initial Conditions from Non-Linear Cosmic Structures using Field-Level Emulators"
Doeser et al


(Tested on matter, differences likely worse for galaxies)
Speed up perhaps not so impressive, but scaling with N-body resolution may be
"Field Level Neural Network Emulator for Cosmological N-body Simulations"
Jamieson et al
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Hydro Simulators at scale
"BaryonBridge: Interpolants models for fast hydrodynamical simulations"
Horowitz, Cuesta-Lazaro, Yehia (in prep)

Particle Mesh for Gravity
CAMELS Volumes
1000 boxes with varying cosmology and feedback models

Gas Properties

Current model optimised for Lyman Alpha forest
7 GPU minutes for a 50 Mpc simulation
130 million CPU core hours for TNG50

Density
Temperature
Galaxy Distribution
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
The Roadmap
2) Assess the robustness of field-level inference via parameter-masked mock challenges in realistic scenarios (example Beyond2pt)
3) Development of open source ecosystems for more plug and play models
Field level analysis too complex for one group to develop a robust framework!
1) Need to develop better validation metrics (requires better validation suites)

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
FLI Benchmarks
Non-linearity
Simulated Volume
Resolution
"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Sampling Methods
Why do we need gradients to sample in high dimensions?



Image Credit: "Probabilistic Computation" Michael Betancourt
Distance from the mode
https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html
Typical Set

Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

Random Walks are isotropic, will bring you outside the typical set
Can we find a sampling algorithm that keeps samples in the typical set whilst moving far?
https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025
Find trajectories of similar Energy (p) that are far away

Hamiltonian Monte Carlo
https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng
Learning likelihoods at the field-level
Carolina Cuesta-Lazaro IAIFI/MIT @ Sexten 2025

["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Target Distribution
Simulated Galaxy 3d Map
Base Distribution
Prompt:
FLI-Sexten-2025
By carol cuesta
FLI-Sexten-2025
- 46