Yeah, not sure what to think. I don't have a good sense of how many phage genomes they used for training and how similar they are to each other. There's a long history of recoding phage genomes, decompressing them, rearranging gene order, and phages are generally quite resilient to this type of manipulation. With enough training data, why wouldn't an AI model also be able to come up with reshuffled versions that work?
Hey Claus, I stumbled on this blog by chance and it turns out that I'm trying out these models for predicting fitness of single point mutation variants of HIV-1. How do you think viral epistatic interactions may be modelled? Because it seems more like a physics problem for me at this point, maybe we need to integrate AI and physics principles at some level?
It is an interesting problem for many downstream applications like gene therapy. I'll continue to explore more along these lines, might get some interesting results someday.
Ito, J., Strange, A., Liu, W. et al. A protein language model for exploring viral fitness landscapes. Nat Commun 16, 4236 (2025). https://doi.org/10.1038/s41467-025-59422-w
An interesting read, seems they modeled epistatic interactions to an extent.
Do you suspect something inherent in virus-specific evolution pressures? Wondering if a higher viral mutation rate leads to greater genetic drift and more chance of mechanistic diversity, more difficult to make predictions about with less data. If you subdivide by say, RNA viruses, do you see a greater split? Easy way to test this idea would be to compare success with which the protein language models predict, say, the results of IgG somatic hypermutation. Really interesting finding, thanks for the post!
I think (but it's difficult to quantify/prove) a big component is that the viral fitness landscapes we assay are somehow more random. If you're predicting fluorescence in GFP, all the relevant factors are inherent to the protein, and that may make them more predictable than say antibody escape, where fitness depends on the interaction between the viral protein and an antibody.
Reading around, found a nice paper supporting that idea, thought id leave here for anybody interested
Acevedo A, Brodsky L, Andino R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature. 2014 Jan 30;505(7485):686-90.
doi: 10.1038/nature12861. Epub 2013 Nov 27. PMID: 24284629; PMCID: PMC4111796.
curious what you think of the Arc Institute's recent paper describing generating viable phage genomes using a fine tuned foundation model: https://www.biorxiv.org/content/10.1101/2025.09.12.675911v1
Yeah, not sure what to think. I don't have a good sense of how many phage genomes they used for training and how similar they are to each other. There's a long history of recoding phage genomes, decompressing them, rearranging gene order, and phages are generally quite resilient to this type of manipulation. With enough training data, why wouldn't an AI model also be able to come up with reshuffled versions that work?
Hey Claus, I stumbled on this blog by chance and it turns out that I'm trying out these models for predicting fitness of single point mutation variants of HIV-1. How do you think viral epistatic interactions may be modelled? Because it seems more like a physics problem for me at this point, maybe we need to integrate AI and physics principles at some level?
Sure. The question is how. I don't think anybody knows at this point. You'll just have to try some things and see what works for you and your system.
It is an interesting problem for many downstream applications like gene therapy. I'll continue to explore more along these lines, might get some interesting results someday.
Ito, J., Strange, A., Liu, W. et al. A protein language model for exploring viral fitness landscapes. Nat Commun 16, 4236 (2025). https://doi.org/10.1038/s41467-025-59422-w
An interesting read, seems they modeled epistatic interactions to an extent.
Do you suspect something inherent in virus-specific evolution pressures? Wondering if a higher viral mutation rate leads to greater genetic drift and more chance of mechanistic diversity, more difficult to make predictions about with less data. If you subdivide by say, RNA viruses, do you see a greater split? Easy way to test this idea would be to compare success with which the protein language models predict, say, the results of IgG somatic hypermutation. Really interesting finding, thanks for the post!
I think (but it's difficult to quantify/prove) a big component is that the viral fitness landscapes we assay are somehow more random. If you're predicting fluorescence in GFP, all the relevant factors are inherent to the protein, and that may make them more predictable than say antibody escape, where fitness depends on the interaction between the viral protein and an antibody.
Reading around, found a nice paper supporting that idea, thought id leave here for anybody interested
Acevedo A, Brodsky L, Andino R. Mutational and fitness landscapes of an RNA virus revealed through population sequencing. Nature. 2014 Jan 30;505(7485):686-90.
doi: 10.1038/nature12861. Epub 2013 Nov 27. PMID: 24284629; PMCID: PMC4111796.