A biased Review
[Video Credit: N-body simulation Francisco Villaescusa-Navarro]
IAIFI Fellow
Carolina Cuesta-Lazaro
A digital twin of our Universe
Observed Galaxy Distribution
Simulated Galaxy Distribution
Field Level Inference
Forward Model
(= bye bye Cosmic Variance)
Optimal constraints
N-point functions
Counts-in-cell
Wavelets
Marked tpcfs
Voids
Do we really need to infer the ICs?
["On the Connection between Field-Level Inference and N-point Correlation Functions" Schmidt]
M-th order. forward model: info on N <= M+1
["Optimal Neural Summarisation for Full-Field Weak Lensing Cosmological Implicit Inference" Lanzieri et al]
High-Dimensional
Low-Dimensional
s is sufficient iif
Maximise
Mutual Information
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
arXiv:2311.17141]
Robustness?
Optimal Summaries
FLI
Same pixel-level fidelity required
Number of simulations needed?
Training simulations are IID
Very high dimensional inference!
Low dimensional inference
Reconstructing ALL latent variables:
Dark Matter distribution
Entire formation history
Peculiar velocities
Predictive:
Cross-Correlation with other probes without Cosmic Variance
[Image Credit: Yuuki Omori]
Constraining Inflation:
Inferring primordial non-gaussianity
"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
"Bayesian physical reconstruction of initial conditions from large scale structure surveys" Jasche, Wandelt (2012)
Initials
Finals
Galaxies
"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al
"The Manticore Project I: a digital twin of our cosmic neighbourhood from Bayesian field-level analysis" MacAlpine et al
"Field-level inference of cosmic shear with intrinsic alignments and baryons" Porqueres et al
"Euclid: Field-level inference of primordial non-Gaussianity and cosmic initial conditions" Andrews et al
"How much information can be extracted from galaxy clustering at the field level?" Nguyen, Schmidt, Tucci, Reinecke, Kostić
["A Parameter-Masked Mock Data Challenge for Beyond-Two-Point Galaxy Clustering statistics" Krause et al arXiv:arXiv:2405.02252]
"With four parameters I can fit an elephant, and with five I can make him wiggle his trunk."
LCDM fit DESI
"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
Internal Consistency
Initial P(k) consistency with LCDM
Cross-Validation
Reconstructing lensing convergence from Planck
Credible region (CR)
Not unique
High Posterior Density region (HPD)
Smallest "volume"
True value in CR with
probability
Empirical Coverage Probability (ECP)
["Investigating the Impact of Model Misspecification in Neural Simulation-based Inference" Cannon et al arXiv:2209.01845 ]
Underconfident
Overconfident
Ciela-institute/TARP
A) Gravity
B) Galaxy biasing
Sampling Methods
1) Forward Model
C) Survey Systematics
2) Sampling Method
N-body
Particle Mesh
Effective Field Theories
Compute
- Galaxy formation
Compute
- Galaxy formation
Neural Network Corrections
Neural Network Emulators
["Hybrid Physical Neural ODEs for Fast N-body simulations" Lanzieri, Lanusse, Starck]
["Field Level Neural Network Emulator for Cosmological N-body Simulations" Jamieson et al]
Fast
Accurate
Scale Range
Efficient Sampling
Self consistent predictions
Directly? linked to physical processes
Large Volumes
Large Volumes
MTNG ~ 500 Mpc/h
Robust
Clear assumptions
Large Scales
Galaxy formation?
["Full-shape analysis with simulation-based priors: Constraints on single field inflation from BOSS" Ivanov, Cuesta-Lazaro, Mishra-Sharma, Oblujen, Toomey arXiv:2402.13310]
["Differentiable Cosmological Hydrodynamics for Field-Level Inference and High Dimensional Parameter Constraints" Horowitz, Lukic arXiv:2502.02294]
Effective Field Theories
Empirical
HOD/SHAM
Fast
Accurate?
Hydrodynamics
Fast
Clear assumptions
Galaxy formation?
[Video credit: Francisco Villaescusa-Navarro]
Gas density
Gas temperature
Subgrid model 1
Subgrid model 2
Subgrid model 3
Subgrid model 4
Learn a representation for feedback
Dark Matter
Baryonic fields
Fiber Collisions
Survey Mask
Target Selection
(tracer dependent priorities)
"Systematic-free inference of the cosmic matter density field from SDSS3-BOSS data" Lavaux, Jasche, Lecrerq
Gaussian
Poisson
"Impacts of the physical data model on the forward inference of initial conditions from biased tracers" Nguyen, Schmidt, Lavaux, Jasche
More testing in realistic scenarios
"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia
"Microcanonical Hamiltonian Monte Carlo" Robnik, De Luca, Silverstein, Seljak
"Field-Level Inference with Microcanonical Langevin Monte Carlo"
Bayer, Seljak, Modi
BORG: 2LPT Gravity, Power-law galaxy bias
2) Likelihood is complex for realistic scenarios, but can get samples from simulator
1) Current analysis rely on simple forward models pushed to small scales (+ differentiable)
4) Not amortized -> Rigurous testing (coverage) becomes extremely hard
3) Either not sampling cosmology at all, or struggling to
GANS
Deep Belief Networks
2006
VAEs
Normalising Flows
BigGAN
Diffusion Models
2014
2017
2019
2022
A folk music band of anthropomorphic autumn leaves playing bluegrass instruments
Contrastive Learning
2023
["Genie 2: A large-scale foundation model" Parker-Holder et al]
["Generative AI for designing and validating easily synthesizable and structurally novel antibiotics" Swanson et al]
Probabilistic ML has made high dimensional inference tractable
1024x1024xTime
"Reconstructing Cosmological Initial Conditions from Late-Time Structure with Convolutional Neural Networks"
Schallue, Eisenstein
1) Learning the posterior mean with deterministic models
2) Learning to sample with generative models
"Posterior Sampling of the Initial Conditions of the Universe from Non-linear Large Scale Structures using Score-Based Generative Models" Legin et al
True
Reconstructed
"Joint cosmological parameter inference and initial condition reconstruction with Stochastic Interpolants"
Cuesta-Lazaro, Bayer, Albergo et al
NeurIPs ML4PS 2024 Spotlight talk
Stochastic Interpolants
"Bayesian Inference of Initial Conditions from Non-Linear Cosmic Structures using Field-Level Emulators"
Doeser et al
(Tested on matter, differences likely worse for galaxies)
Speed up perhaps not so impressive, but scaling with N-body resolution may be
"Field Level Neural Network Emulator for Cosmological N-body Simulations"
Jamieson et al
"BaryonBridge: Interpolants models for fast hydrodynamical simulations"
Horowitz, Cuesta-Lazaro, Yehia (in prep)
Particle Mesh for Gravity
CAMELS Volumes
1000 boxes with varying cosmology and feedback models
Gas Properties
Current model optimised for Lyman Alpha forest
7 GPU minutes for a 50 Mpc simulation
130 million CPU core hours for TNG50
Density
Temperature
Galaxy Distribution
2) Assess the robustness of field-level inference via parameter-masked mock challenges in realistic scenarios (example Beyond2pt)
3) Development of open source ecosystems for more plug and play models
Field level analysis too complex for one group to develop a robust framework!
1) Need to develop better validation metrics (requires better validation suites)
Non-linearity
Simulated Volume
Resolution
"Benchmarking field-level cosmological inference from galaxy redshift surveys" Simon-Onfroy, Lanusse, De Mattia
Why do we need gradients to sample in high dimensions?
Image Credit: "Probabilistic Computation" Michael Betancourt
Distance from the mode
https://betanalpha.github.io/assets/case_studies/probabilistic_computation.html
Typical Set
Random Walks are isotropic, will bring you outside the typical set
Can we find a sampling algorithm that keeps samples in the typical set whilst moving far?
https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng
Find trajectories of similar Energy (p) that are far away
https://chi-feng.github.io/mcmc-demo/
Image Credit: "The Markov-chain Monte Carlo Interactive Gallery" Chi Feng
["A point cloud approach to generative modeling for galaxy surveys at the field level"
Cuesta-Lazaro and Mishra-Sharma
International Conference on Machine Learning ICML AI4Astro 2023, Spotlight talk, arXiv:2311.17141]
Target Distribution
Simulated Galaxy 3d Map
Base Distribution
Prompt: