Determining the effects of viral mutations without experiments

 

Jesse Bloom

Fred Hutch Cancer Center / HHMI

 

 

Slides: https://slides.com/jbloom/grc2025

 

Some viruses evolve very rapidly

Determining effects of viral mutations is important

  1. Interpret consequences of mutations seen during viral surveillance.
  2. Inform design of drugs and vaccine updates.
  3. Understand function and mechanisms of viral proteins.

Different patterns of evolution at different sites

Traditional way to determine effect of mutations is experiments

Traditional way to determine effect of mutations is experiments

Traditional way to determine effect of mutations is experiments

Traditional way to determine effect of mutations is experiments

Traditional way to determine effect of mutations is experiments

My group tries to do such experiments at large scale via deep mutational scanning

Yeast display or lentiviral pseudotype libraries allow us to measure many mutants at once by pooling them all together and reading out effects of mutations by deep sequencing (Starr et al, 2020; Dadonaite et al, 2023)

Limitations of using experiments to understand effects of mutations

Laborious: even with deep mutational scanning, it's a lot of effort.

 

 

Limitations of using experiments to understand effects of mutations

Laborious: even with deep mutational scanning, it's a lot of effort.

 

Lab assays measure effects of mutations in cells or mice, not humans. This is not the same as fitness in the real world.

 

 

Limitations of using experiments to understand effects of mutations

Laborious: even with deep mutational scanning, it's a lot of effort.

 

Lab assays measure effects of mutations in cells or mice, not humans. This is not the same as fitness in the real world.

 

Some viral proteins have poorly understood functions that lack good lab assays.

Nature is "testing" effects of viral mutations in humans all the time

Average neutral single-nucleotide mutation has occurred ~30,000 independent times in human transmitted SARS-CoV-2

  • Viral substitution rate at synonymous sites: ~7.5e-4 substitutions/year (Neher, 2022)
  • Typical infection duration: ~5 days = 0.01 years/infection
  • Total human infections with SARS-CoV-2: ~12e9 infections
  • So total synonymous substitutions per site: 7.5e-4 x 0.01 x 12e9 = 90,000
  • There are three possible mutations per site: 45,000 / 3 = 30,000
  • Mutation spectrum uneven, so some mutations have occurred more than others:
    • C->T mutations have occurred ~100,000 times
    • A->C mutations have occurred ~2,000 times

We can use publicly available human SARS-CoV-2 sequences to "read out" effects of viral mutations on human transmission

  • We use the ~10 million public sequences in the UShER mutation-annotated tree
  • These sequences represent ~0.1% of all human SARS-CoV-2 infections 

First calculate how often each mutation expected to be observed without selection by analyzing 4-fold degenerate sites

We count unique occurrences of mutation, not number of sequences with mutation

Mutations expected to be observed ~10 to ~700 times in absence of selection

There are enough sequences to calculate effects on a per-mutation basis

We calculate effect as log of actual versus expected mutation counts

fitness effect of mutation = log (actual counts / expected counts)

Effects of zero indicate neutral mutation, negative indicates deleterious mutation

Distribution of effects of all mutations

We can see which genes are under strong purifying selection

Among accessory genes, ORF3a is under strongest selection against stop codons

Experiments show that only accessory gene deletion that strongly attenuates virus in animal models is ORF3 (McGrath et al, 2022)

Crucially, we see effect of each mutation

Key sites in proteins of unknown function

These maps can identify constrained sites

Estimated mutation effects are robust to sequence sampling location

Estimated mutation effects are robust to viral clade identity

Estimated mutation effects correlate well with deep mutational scanning

Two spike deep mutational scans using different underlying methodologies: lentiviral pseudotyping of spike or yeast display of RBD

Maps of mutation effects to all viral proteins

Areas for future work and limitations

Quantitative relationship between the ratio of observed versus expected counts and fitness depends on sampling intensity

 

Areas for future work and limitations

Quantitative relationship between the ratio of observed versus expected counts and fitness depends on sampling intensity

 

There is additional information in dynamics of mutation after it occurs that our method currently does not leverage

 

 

Areas for future work and limitations

Quantitative relationship between the ratio of observed versus expected counts and fitness depends on sampling intensity

 

There is additional information in dynamics of mutation after it occurs that our method currently does not leverage

 

Accuracy of our our approach depends critically:

  1. Having a dataset free of sequencing/bioinformatic errors
  2. Accurately estimating per-site mutation rate

 

 

Areas for future work and limitations

Quantitative relationship between the ratio of observed versus expected counts and fitness depends on sampling intensity

 

There is additional information in dynamics of mutation after it occurs that our method currently does not leverage

 

Accuracy of our our approach depends critically:

  1. Having a dataset free of sequencing/bioinformatic errors
  2. Accurately estimating per-site mutation rate

 

This overall approach could be applied to many viruses / organisms with enough sequencing

Thanks

Estimates of mutation rate

Kelley Harris, Annabel Beichman

 

Assistance with UShER

Angie Hinrichs, Russ Corbett-Detig

grc2025

By Jesse Bloom

grc2025

Estimating effects of mutations to all SARS-CoV-2 proteins from actual versus expected mutation counts in natural sequences

  • 36