Great piece, Claus. Very much enjoying the rebirth of your blog.
"Biology is just physics and chemistry" -- you discuss the "levels of organization" later, but perhaps it's worth saying that "just" in the first quote is bearing more weight than is reasonable, and in the second quote, "levels" may have a more precise definition. There are layers of emergence between the physics/chemistry and the biology, that is, phenomena at some scale which are not reducible to features at smaller scales, even though the phenomena arise purely from smaller-scale features (consciousness from neurons, waterfalls from water molecules, murmurations from starlings, selective permeability from masses of lipids, etc.). As Phil Anderson pointed out, emergence acts as a kind of insulation between layers. So there is nothing to be gained by having a better theory of quantum mechanics when considering the behavior of starlings.
Lacking even any particularly useful theory of emergence, and confronted with the obvious wild variety of emergent phenomena, it seems likely that predictive theories of biology, which rests on a stack of emergence, are unknowably far off. Not that we won't have niche examples where we can do fairly well (cf. AlphaFold), but having an example in one niche helps us (where us includes AI) not at all with the next niche.
I agree to a point. Yes, quantum mechanics probably doesn't help explain the behavior of starlings, but I'm quite confident there will be point mutations in certain proteins that affect their behavior, and biology needs to be able to explain those. And the explanation will ultimately link protein biochemistry to starling behavior.
By invoking insulation between layers you're in effect arguing that we can never mechanistically understand the genetic basis for any effects seen at the higher layers, and yet much of biology is specifically concerned with this question.
Well, I don't agree that I'm "in effect arguing" that we can "never" understand higher-level effects in terms of lower-level phenomena. Instead, in order to understand the genetic basis of behavior (say), one must already command the set of emergence layers in between. I'd argue that the only behavior-from-point-mutations we truly understand are those that submit rather readily to a reductionist approach. Watching folks struggle to put together a proper theory for how proteins form condensates has been a rather informative clinic on what it looks like when reductionist approaches are all you have and the reality is that emergence plays a much more important role.
You ask of the physics begets chemistry begets biology concept what it is not prepared to bear. If that ladder hasn’t failed yet it is not long for this world.
Biology is processual, it is not mechanical, so most of what you will directly derive from chemistry will be relatively trivial (in the grander scheme of biology).
I used to work in a research program whose funders (the US National Science Foundation) anticipated that someday the work would add up to a big computer model of an entire ecosystem. Basically, physics begets chemistry begets biology begets ecology, and let's model all of that, using real field data. Here's a million dollars a year. Go.
I heard scientists complain privately that this was impossible, but they were being threatened with cancellation and had to pretend they were working on it. They gave this absurd presentation to NSF that began with "Here at [redacted], we are studying ecosystem dynamics at every level from the Sun to the genes." YEAH, RIGHT!
We could at least determine with some confidence that when it rains, the plants grow, which is to say (ahem), "climate is a primary driver of ecosystem dynamics."
Interesting piece I certainly know what you mean and your point about Baker labs making it seemed like protein design was solved a decade ago was spot on i remember being confused about that and why we didn’t see more uses in real world.
I kind of have a totally opposite experience with neural network potentials though. These try to do exactly what you’re saying re simulating from the ground up. I routinely find them remarkably more capable than I would expect. For instance here I took one trained on crystal structures of small highly ordered periodic systems and used it to simulate the potassium ion channel and reproduced several known features + some new ones that are consistent with mutational studies. https://arxiv.org/abs/2411.18931 and here I simulate carbonic anhydrase where it also seemed to do a remarkably great job. ( https://arxiv.org/abs/2503.13789) Obviously still not perfect but the newer models are already a lot better than the one I used in those papers. I think their are at the right level of abstraction that the data is simple enough that it can be fully learnt but not so simple that it can be fitted l by hand.
Totally agree with what you say but I guess that won’t be news to you.
I guess the most sobering conclusion for me is that it seems we haven’t progressed much beyond lookup in large tables. This idea of identifying effective dynamical rules at higher levels has virtually not worked at all in biology. Maybe a little bit for gene regulatory networks. But overall we’re mostly stuck with making lookup tables and identifying very coarse large scale statistical patterns.
Folks keep asking why biology (and sociology, psychology, economics) isn’t more empirically reliable, like physics and chemistry, or systematic like math.
The correct question is why (or whether) physics and chemistry behave so predictively at certain scales. That is the anomaly.
Great read, but there's one part I disagree with. (And correct me if I'm wrong here but) you mention "for this approach and we don’t have the required compute"
Isn't that the piece that we're seeing the most accelerated progress with in AI and these huge DL models shrinking (e.g the 7M Samsung Model, or Deepseek, or SimpleFold).
From what I understand, the era of comp bio was more of creating tools at the interface (Rosetta and the sort) that can use shortcuts as you say to perform a task, but now these tools getting DL and hardware attention, don't you think that were actually getting a lot closed to simulating real biology faster? personally I'm looking forward to the virtual cell challenge that'll show where the Bio ML community stands
There may be progress in speeding up all-atom simulations. That doesn't mean we can now use such simulations for the wide range of problems biologists try to solve every day. Look at David Shaw. He got a lot of attention for building a special-purpose computer that would speed up MD simulations. And, is anybody using this, or is he making tons of important contributions? No. The impact on the field has been minor.
As a general rule, I'm skeptical about any statement that takes the form "we'll soon be able to ...". If we look at what we can do today, simulations are nowhere near fast enough to do anything other than very detailed studies of individual mutations. Nobody does protein engineering by simulating all the possible variants and picking the ones that appear best in simulation. What happens in the future, we'll see.
Did they include viral proteins in the training data for AlphaFold2? A major part of AlphaFold2 is building multiple sequence alignments of similar proteins to the query. If there aren't enough, I think less than 30, AlphaFold will have a hard time.
Plenty of viral proteins are very different from anything else in nature; they often don't even have the traditional start codon.
I operate at the other end of biology, field work on beetles, ants and flowering plants. The version where it is just physics and chemistry is a distant dream. The basic questions are "what lives here" "what is different about this place, where X is absent, compared to that where it is that place where it is present". It's a mishmash of soil chemistry, presence of other species, climate, and the history of the site. The only authoritative answer is the person who says that they introduced it, everyone else is providing an opinion. It makes it incredibly hard to train an AI when the published data captures less than 1% of the relevant information. Training in field work requires just that, field work, and until AI has autonomous mobile agents it has no hope.
Great piece! For all the reasons you mentioned I think challenges/competitions that separate model development from model evaluation are very helpful to get a more accurate idea of where a given field actually stands (or independent benchmarking studies). Eg in single-cell transcriptomics many methods were developed to predict transcriptomes after perturbations, but rigorous benchmarking later on showed that their performance was rather poor…
Great piece, Claus. Very much enjoying the rebirth of your blog.
"Biology is just physics and chemistry" -- you discuss the "levels of organization" later, but perhaps it's worth saying that "just" in the first quote is bearing more weight than is reasonable, and in the second quote, "levels" may have a more precise definition. There are layers of emergence between the physics/chemistry and the biology, that is, phenomena at some scale which are not reducible to features at smaller scales, even though the phenomena arise purely from smaller-scale features (consciousness from neurons, waterfalls from water molecules, murmurations from starlings, selective permeability from masses of lipids, etc.). As Phil Anderson pointed out, emergence acts as a kind of insulation between layers. So there is nothing to be gained by having a better theory of quantum mechanics when considering the behavior of starlings.
Lacking even any particularly useful theory of emergence, and confronted with the obvious wild variety of emergent phenomena, it seems likely that predictive theories of biology, which rests on a stack of emergence, are unknowably far off. Not that we won't have niche examples where we can do fairly well (cf. AlphaFold), but having an example in one niche helps us (where us includes AI) not at all with the next niche.
I agree to a point. Yes, quantum mechanics probably doesn't help explain the behavior of starlings, but I'm quite confident there will be point mutations in certain proteins that affect their behavior, and biology needs to be able to explain those. And the explanation will ultimately link protein biochemistry to starling behavior.
By invoking insulation between layers you're in effect arguing that we can never mechanistically understand the genetic basis for any effects seen at the higher layers, and yet much of biology is specifically concerned with this question.
Well, I don't agree that I'm "in effect arguing" that we can "never" understand higher-level effects in terms of lower-level phenomena. Instead, in order to understand the genetic basis of behavior (say), one must already command the set of emergence layers in between. I'd argue that the only behavior-from-point-mutations we truly understand are those that submit rather readily to a reductionist approach. Watching folks struggle to put together a proper theory for how proteins form condensates has been a rather informative clinic on what it looks like when reductionist approaches are all you have and the reality is that emergence plays a much more important role.
You ask of the physics begets chemistry begets biology concept what it is not prepared to bear. If that ladder hasn’t failed yet it is not long for this world.
Biology is processual, it is not mechanical, so most of what you will directly derive from chemistry will be relatively trivial (in the grander scheme of biology).
I used to work in a research program whose funders (the US National Science Foundation) anticipated that someday the work would add up to a big computer model of an entire ecosystem. Basically, physics begets chemistry begets biology begets ecology, and let's model all of that, using real field data. Here's a million dollars a year. Go.
I heard scientists complain privately that this was impossible, but they were being threatened with cancellation and had to pretend they were working on it. They gave this absurd presentation to NSF that began with "Here at [redacted], we are studying ecosystem dynamics at every level from the Sun to the genes." YEAH, RIGHT!
We could at least determine with some confidence that when it rains, the plants grow, which is to say (ahem), "climate is a primary driver of ecosystem dynamics."
Predicting clinical toxicology and in vivo binding affinity is also a non-trivial problem
Interesting piece I certainly know what you mean and your point about Baker labs making it seemed like protein design was solved a decade ago was spot on i remember being confused about that and why we didn’t see more uses in real world.
I kind of have a totally opposite experience with neural network potentials though. These try to do exactly what you’re saying re simulating from the ground up. I routinely find them remarkably more capable than I would expect. For instance here I took one trained on crystal structures of small highly ordered periodic systems and used it to simulate the potassium ion channel and reproduced several known features + some new ones that are consistent with mutational studies. https://arxiv.org/abs/2411.18931 and here I simulate carbonic anhydrase where it also seemed to do a remarkably great job. ( https://arxiv.org/abs/2503.13789) Obviously still not perfect but the newer models are already a lot better than the one I used in those papers. I think their are at the right level of abstraction that the data is simple enough that it can be fully learnt but not so simple that it can be fitted l by hand.
Yes. I'm a big fan of using deep learning for very targeted predictions that then are combined with biophysical modeling. See also this editorial piece: https://www.pnas.org/doi/abs/10.1073/pnas.2513608122
Hi Claus,
Totally agree with what you say but I guess that won’t be news to you.
I guess the most sobering conclusion for me is that it seems we haven’t progressed much beyond lookup in large tables. This idea of identifying effective dynamical rules at higher levels has virtually not worked at all in biology. Maybe a little bit for gene regulatory networks. But overall we’re mostly stuck with making lookup tables and identifying very coarse large scale statistical patterns.
Yes. Along the same lines, this just came out: https://www.nature.com/articles/s41467-025-63947-5
Folks keep asking why biology (and sociology, psychology, economics) isn’t more empirically reliable, like physics and chemistry, or systematic like math.
The correct question is why (or whether) physics and chemistry behave so predictively at certain scales. That is the anomaly.
Great read, but there's one part I disagree with. (And correct me if I'm wrong here but) you mention "for this approach and we don’t have the required compute"
Isn't that the piece that we're seeing the most accelerated progress with in AI and these huge DL models shrinking (e.g the 7M Samsung Model, or Deepseek, or SimpleFold).
From what I understand, the era of comp bio was more of creating tools at the interface (Rosetta and the sort) that can use shortcuts as you say to perform a task, but now these tools getting DL and hardware attention, don't you think that were actually getting a lot closed to simulating real biology faster? personally I'm looking forward to the virtual cell challenge that'll show where the Bio ML community stands
There may be progress in speeding up all-atom simulations. That doesn't mean we can now use such simulations for the wide range of problems biologists try to solve every day. Look at David Shaw. He got a lot of attention for building a special-purpose computer that would speed up MD simulations. And, is anybody using this, or is he making tons of important contributions? No. The impact on the field has been minor.
As a general rule, I'm skeptical about any statement that takes the form "we'll soon be able to ...". If we look at what we can do today, simulations are nowhere near fast enough to do anything other than very detailed studies of individual mutations. Nobody does protein engineering by simulating all the possible variants and picking the ones that appear best in simulation. What happens in the future, we'll see.
Did they include viral proteins in the training data for AlphaFold2? A major part of AlphaFold2 is building multiple sequence alignments of similar proteins to the query. If there aren't enough, I think less than 30, AlphaFold will have a hard time.
Plenty of viral proteins are very different from anything else in nature; they often don't even have the traditional start codon.
I operate at the other end of biology, field work on beetles, ants and flowering plants. The version where it is just physics and chemistry is a distant dream. The basic questions are "what lives here" "what is different about this place, where X is absent, compared to that where it is that place where it is present". It's a mishmash of soil chemistry, presence of other species, climate, and the history of the site. The only authoritative answer is the person who says that they introduced it, everyone else is providing an opinion. It makes it incredibly hard to train an AI when the published data captures less than 1% of the relevant information. Training in field work requires just that, field work, and until AI has autonomous mobile agents it has no hope.
Great piece! For all the reasons you mentioned I think challenges/competitions that separate model development from model evaluation are very helpful to get a more accurate idea of where a given field actually stands (or independent benchmarking studies). Eg in single-cell transcriptomics many methods were developed to predict transcriptomes after perturbations, but rigorous benchmarking later on showed that their performance was rather poor…