Protein language models are bad at mutational…

Mar 19

Biology is hard. Yes, even for AI.

7 Comments

We did the same for predicting antifungal resistance arising from mutations using ESM-2 embeddings and DMS data. When we split data by position performance dropped significantly as well but stayed above the random guess. But I didn't check the ratio of low- and high-variance sites in our data, it's a great idea, and I try to do it

Michel Nivard

Mar 19

Hm, wouldn’t splitting the training data by genes be even fairer? Train of 90% evaluate on the 10% held out? The promise of mutation prediction seems to be that we’d generalize the information captured in saturated mutagenesis models beyond the set of genes that have been studied? Would also be very interesting (but absolutely horrendously expensive, unless there are ESM snapshots partly through pretraining) to plot base training data volume vs R2, I think it might be related to “volume seen in base training”?

Reply (1)

Claus Wilke

Mar 19

DMS data is collected one protein at a time, you can't meaningfully split it at the protein level.

Reply (1)

Michel Nivard

Mar 20

Yes this is my nativity (I am a statistical geneticist, or genetic epidemiologist I know just enough to be dangerously wrong I guesss) , I though ppl would finetune a base protein model across different DMS studies, training on DMS data for like 90% proteins in protein-bench and then predict DMS deleteriousness for mutations in holdout proteins… I mean isn’t what you want to “learn” in some idealized future a model that generalizes to unseen proteins?

Reply (1)

Claus Wilke

Mar 20

You can do that only if you're measuring the same phenotype across different DMS experiments. That's not normally the case. The one exception is stability effects, where people have done similar things to what you mention. E.g.: https://www.nature.com/articles/s41467-024-49780-2

Reply (1)

Michel Nivard

Mar 20

Interesting, I had human genetics biased intuitions that this would prob kinda work for broadly defined “fitness” or “binding” phenotypes! But good thing i came across this post, Tnx

Reply (1)

Claus Wilke

Mar 20

Your intuition is relevant for the topic of zero-shot predictions, where we don't train supervised models. Will have a paper on that topic soon. Stay tuned.

Genes, Minds, Machines

Protein language models are bad at mutational…