<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Genes, Minds, Machines]]></title><description><![CDATA[Genes, Minds, Machines: Thoughts about Science, Communication, and AI. A newsletter covering topics in biology, data visualization, effective communication, AI, and higher education.]]></description><link>https://blog.genesmindsmachines.com</link><image><url>https://substackcdn.com/image/fetch/$s_!3tvK!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png</url><title>Genes, Minds, Machines</title><link>https://blog.genesmindsmachines.com</link></image><generator>Substack</generator><lastBuildDate>Thu, 23 Apr 2026 23:51:40 GMT</lastBuildDate><atom:link href="https://blog.genesmindsmachines.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Claus Wilke]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[clauswilke@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[clauswilke@substack.com]]></itunes:email><itunes:name><![CDATA[Claus Wilke]]></itunes:name></itunes:owner><itunes:author><![CDATA[Claus Wilke]]></itunes:author><googleplay:owner><![CDATA[clauswilke@substack.com]]></googleplay:owner><googleplay:email><![CDATA[clauswilke@substack.com]]></googleplay:email><googleplay:author><![CDATA[Claus Wilke]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[The Google Scholar preprint bug strikes again]]></title><description><![CDATA[Google is never going to fix this bug, are they?]]></description><link>https://blog.genesmindsmachines.com/p/the-google-scholar-preprint-bug-strikes</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/the-google-scholar-preprint-bug-strikes</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Tue, 31 Mar 2026 01:10:47 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!wG3_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>For the last couple of weeks, Google Scholar has been complaining to me that one of my articles is not publicly available, in violation of a funder-imposed public access mandate. When I go to my Google Scholar page, there is a big notification box on the top of the page that asks me to review the situation. This is rather annoying, because (as you will see in a moment) there is nothing I have done wrong. I have done everything the NIH&#8212;my funder&#8212;wants me to do. The entity that is wrong is Google. In fact, I believe what I&#8217;m seeing is a version of the Google Scholar preprint bug, which I&#8217;ve reported on for over a decade, see for example <a href="https://clauswilke.com/blog/2014/11/01/the-google-scholar-preprint-bug/">here</a> or <a href="https://clauswilke.com/blog/2015/10/08/google-scholar-bug-redux/">here.</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wG3_!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wG3_!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 424w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 848w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 1272w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wG3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png" width="610" height="409.66789667896677" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:728,&quot;width&quot;:1084,&quot;resizeWidth&quot;:610,&quot;bytes&quot;:186409,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191880387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wG3_!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 424w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 848w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 1272w, https://substackcdn.com/image/fetch/$s_!wG3_!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc79d875b-33cb-4a16-8f21-188a19e754db_1084x728.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>When I go to the page where I can review the situation, Google Scholar shows me the offending article. It is a preprint from 2026, published on bioRxiv. You can <a href="https://doi.org/10.64898/2026.01.06.697994">read it here.</a> Yes, Google Scholar complains that a preprint on bioRxiv is not publicly available. But it gets worse.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pv49!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pv49!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 424w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 848w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 1272w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pv49!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png" width="1456" height="563" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/cefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:563,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:145815,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191880387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Pv49!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 424w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 848w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 1272w, https://substackcdn.com/image/fetch/$s_!Pv49!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fcefcd818-23ae-43cd-a6f2-bbf8692e8f9a_1532x592.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Technically, the NIH doesn&#8217;t just require papers to be available. It wants them to be deposited in PubMed Central. So maybe that&#8217;s Google&#8217;s beef? That the paper is available on bioRxiv but not on PubMed Central? Well, that&#8217;s a neat theory, but it falls flat. It falls flat because the paper is actually on PubMed Central. You can check for yourself <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC12803269/">here.</a> The NIH has a pilot program where they scan bioRxiv for NIH-funded research and automatically pull any preprints that match their criteria into PubMed Central. This has worked beautifully for all recent preprints my lab has published, and I never think about it because it works so smoothly. Everybody is happy. The NIH, the public, me. Except Google Scholar. They have taken it upon themselves to become open access warriors, and in the process they are now falsely accusing honest researchers of violating open-access mandates.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GJGu!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GJGu!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 424w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 848w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 1272w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GJGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png" width="1456" height="765" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:765,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:192592,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191880387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GJGu!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 424w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 848w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 1272w, https://substackcdn.com/image/fetch/$s_!GJGu!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a128494-7f5a-4084-98cd-4d660a8d5c97_1500x788.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>So what&#8217;s going on? Digging a bit deeper, I have a pretty good idea about what the issue is. We&#8217;ll get to that in a second. Let&#8217;s collect a bit more evidence first.</p><p>Do you know how, when you click on the Google Scholar record for an article, it gives you the option to review all the alternative versions of the article?<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Well, for this particular preprint, that&#8217;s missing. Google Scholar is not aware of any alternative versions. And, even worse, Google Scholar doesn&#8217;t even point to the correct article. Instead of pointing to bioRxiv, it points to <a href="https://europepmc.org/article/med/41542510">Europe PMC.</a> Google Scholar has completely messed up. It doesn&#8217;t know that my bioRxiv preprint is on bioRxiv, it doesn&#8217;t know that it is on PubMed Central, and it sends people on a wild goose chase to Europe PMC, which then points to bioRxiv. </p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qo3p!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qo3p!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 424w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 848w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 1272w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qo3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png" width="598" height="127.55972696245733" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:250,&quot;width&quot;:1172,&quot;resizeWidth&quot;:598,&quot;bytes&quot;:67237,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191880387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!qo3p!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 424w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 848w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 1272w, https://substackcdn.com/image/fetch/$s_!qo3p!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd5ede2ca-4a08-4553-aa39-57562a055268_1172x250.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><p>So what we&#8217;re dealing with here is an outdated or less authoritative link that is causing more authoritative links to disappear from the Google Scholar database. Have we ever seen anything like this? I&#8217;m glad you asked. Yes we have. It&#8217;s the <a href="https://clauswilke.com/blog/2014/11/01/the-google-scholar-preprint-bug/">Google Scholar preprint bug,</a> which I have been documenting since 2014. Hundreds of scientists (that I know of) have complained about it, because it can have the unfortunate consequence of removing your published paper from the Google Scholar database. This is particularly frustrating for junior scientists on the job market, because it matters whether Google Scholar is showing your recent Nature paper or just the corresponding bioRxiv preprint.</p><p>In 2015, <a href="https://clauswilke.com/blog/2015/10/08/google-scholar-bug-redux/">I even discussed it with Anurag Acharya,</a> co-founder of Google Scholar.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> This discussion left me with the impression that the Google Scholar team does not understand the issue, or the severity of it, and will never fix the problem. And here we are, a decade later, the problem still exists, and now it&#8217;s causing down-stream consequences such as accusing me of violating the NIH open-access policy.</p><p>For completeness, I am reproducing here my 2015 conversation with Anurag Acharya, as it is as relevant today as it was back then.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-aGv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-aGv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 424w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 848w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 1272w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-aGv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png" width="1456" height="8163" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:8163,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:6049069,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191880387?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-aGv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 424w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 848w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 1272w, https://substackcdn.com/image/fetch/$s_!-aGv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7d8341ec-4d57-47ab-bd2d-78fbaa65ae33_3976x22291.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>The link usually says &#8220;All <em>n</em> versions,&#8221; with <em>n</em> being the number of different versions Google Scholar has found. See <a href="https://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=Nc8U6E4AAAAJ&amp;citation_for_view=Nc8U6E4AAAAJ:9yKSN-GCB0IC">here</a> for an example, at the very bottom of the page. As of this writing, it says &#8220;<a href="https://scholar.google.com/scholar?oi=bibs&amp;hl=en&amp;cluster=10384935850530543589">All 18 versions.</a>&#8221;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>My discussion with Anurag Acharya can be found <a href="https://scholarlykitchen.sspnet.org/2015/10/05/guest-post-highwires-john-sack-on-online-indexing-of-scholarly-publications-part-1-what-we-all-have-accomplished/#comment-155912">in the comments section to this 2015 article by the Scholarly Kitchen.</a> I&#8217;m impressed by the fact that the Scholarly Kitchen is still hosting the comments to a decade-old article.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Creating reproducible data analysis pipelines]]></title><description><![CDATA[There was a discussion recently on Bluesky about reproducible data analysis pipelines.]]></description><link>https://blog.genesmindsmachines.com/p/creating-reproducible-data-analysis</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/creating-reproducible-data-analysis</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Fri, 27 Mar 2026 22:45:18 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/34bac309-a55e-4f3a-9c76-cc7659caa2a7_1750x1440.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There was a discussion recently on Bluesky about reproducible data analysis pipelines. This is a complex topic, and it&#8217;s difficult to do it justice in a bunch of 300 character posts. So I thought I&#8217;d take the opportunity to collect my thoughts on this topic in a longer-form article.</p><p>The discussion started with this post by Darren Dahly:</p><div class="bluesky-wrap outer" style="height: auto; display: flex; margin-bottom: 24px;" data-attrs="{&quot;postId&quot;:&quot;3mhs6ndgvx22q&quot;,&quot;authorDid&quot;:&quot;did:plc:3zyo4iakhqs47bttjalqgbk7&quot;,&quot;authorName&quot;:&quot;Darren Dahly&quot;,&quot;authorHandle&quot;:&quot;statsepi.bsky.social&quot;,&quot;authorAvatarUrl&quot;:&quot;https://cdn.bsky.app/img/avatar/plain/did:plc:3zyo4iakhqs47bttjalqgbk7/bafkreieurq6uozhnwby2w3lj3cisd7cbqcficeh3pr2c46feowb2uz7psu&quot;,&quot;text&quot;:&quot;As I progressed (hopefully) from data novice to data competent, one of the most impactful practices I adopted was to never* rely on saved intermediate datasets (etc) in my workflow. All projects are designed so that [\&quot;raw\&quot; data -> analysis data -> analysis] is 100% reproduced in every session.&quot;,&quot;createdAt&quot;:&quot;2026-03-24T08:43:51.684Z&quot;,&quot;uri&quot;:&quot;at://did:plc:3zyo4iakhqs47bttjalqgbk7/app.bsky.feed.post/3mhs6ndgvx22q&quot;,&quot;imageUrls&quot;:[]}" data-component-name="BlueskyCreateBlueskyEmbed"><iframe id="bluesky-3mhs6ndgvx22q" data-bluesky-id="8902800244342264" src="https://embed.bsky.app/embed/did:plc:3zyo4iakhqs47bttjalqgbk7/app.bsky.feed.post/3mhs6ndgvx22q?id=8902800244342264" width="100%" style="display: block; flex-grow: 1;" frameborder="0" scrolling="no"></iframe></div><p>To which I replied:</p><div class="bluesky-wrap outer" style="height: auto; display: flex; margin-bottom: 24px;" data-attrs="{&quot;postId&quot;:&quot;3mht4oqhtls2f&quot;,&quot;authorDid&quot;:&quot;did:plc:v4fio6clr4zz64lhdkre7zph&quot;,&quot;authorName&quot;:&quot;Claus Wilke&quot;,&quot;authorHandle&quot;:&quot;clauswilke.com&quot;,&quot;authorAvatarUrl&quot;:&quot;https://cdn.bsky.app/img/avatar/plain/did:plc:v4fio6clr4zz64lhdkre7zph/bafkreibqpgogauwfxkpcgzjofe365vz2q75hqih6bk45n3a4epbs77zixa&quot;,&quot;text&quot;:&quot;I feel strongly this is a terrible idea. I battle it with my students all the time. Examples:\n- Can you quickly make this minor change to this figure? That'll take 30 min. to rerun all the preprocessing.\n- Can you send me the raw data to this figure that contains 5 points? That'll be 10 TB of data.&quot;,&quot;createdAt&quot;:&quot;2026-03-24T17:41:31.156Z&quot;,&quot;uri&quot;:&quot;at://did:plc:v4fio6clr4zz64lhdkre7zph/app.bsky.feed.post/3mht4oqhtls2f&quot;,&quot;imageUrls&quot;:[]}" data-component-name="BlueskyCreateBlueskyEmbed"><iframe id="bluesky-3mht4oqhtls2f" data-bluesky-id="7247818868060623" src="https://embed.bsky.app/embed/did:plc:v4fio6clr4zz64lhdkre7zph/app.bsky.feed.post/3mht4oqhtls2f?id=7247818868060623" width="100%" style="display: block; flex-grow: 1;" frameborder="0" scrolling="no"></iframe></div><p>I believe the difference between Darren&#8217;s position and mine boils down to what an ideal analysis pipeline should look like (Darren&#8217;s perspective) versus what actually works or doesn&#8217;t work in practice, in particular when supervising students who may still be learning the ropes (my perspective). I am all in favor of fully reproducible pipelines that can go from raw data to final figures. And yet, I&#8217;ve seen this approach go wrong in so many ways that I tend to actively discourage my students from pursuing it, at least in the strict way as expressed by Darren where there are no intermediate datasets and the pipeline always has to be run from the very top to make any changes anywhere.</p><p>First, there are a few immediate issues that I&#8217;ve seen crop up way too many times, and that I alluded to in my Bluesky post.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> One is slow turn-around time for minor changes. I&#8217;ve seen way too many students struggle with requests for small modifications to their figures. I ask a student to replace violins with boxplots, or to swap the x and the y axis, and it takes them an afternoon because they have to run everything from the top&#8212;and possibly multiple times&#8212;until the revised figure looks right. Another is gigantic data files that are difficult to archive or share. I&#8217;ve seen students keep raw log files from simulations, literally hundreds of gigabytes of data, but not store the handful of final values they had extracted from these log files.</p><p>Second, I believe intermediate files improve reproducibility, because pipelines break and an intermediate file is always better than a pipeline that no longer runs. Why do pipelines break? For one, students and postdocs, even the experienced ones, fail to anticipate the many ways in which code may no longer work in the future, and as a consequence their &#8220;fully reproducible&#8221; pipelines contain hidden dependencies that can be difficult to satisfy in the future. And also, nearly everything breaks eventually. Will your carefully crafted fully reproducible docker image still work in 20 years? Does it depend on some service that may no longer be available then?</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/creating-reproducible-data-analysis?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/creating-reproducible-data-analysis?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/creating-reproducible-data-analysis?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>All of these issues can be avoided if you make it a habit to always store the final processed data, right before plotting. And to ensure reproducibility, you can read it right back in after saving. Here is an example in Python:</p><div class="highlighted_code_block" data-attrs="{&quot;language&quot;:&quot;python&quot;,&quot;nodeId&quot;:&quot;c4e3df49-1d06-47bc-8231-4da74395a693&quot;}" data-component-name="HighlightedCodeBlockToDOM"><pre class="shiki"><code class="language-python">import pandas as pd

# create your final data by whatever means necessary
final_data_for_plotting = ...
# write the final data frame to csv
final_data_for_plotting.to_csv('final_data.csv', index=False)

# --- if using Jupyter, start a new cell here ---

# read the data back in
final_data_for_plotting = pd.read_csv('final_data.csv')
# place your plotting code here
...</code></pre></div><p>I think Darren knows this, because in a later post he wrote:</p><div class="bluesky-wrap outer" style="height: auto; display: flex; margin-bottom: 24px;" data-attrs="{&quot;postId&quot;:&quot;3mhsj67pi3k22&quot;,&quot;authorDid&quot;:&quot;did:plc:3zyo4iakhqs47bttjalqgbk7&quot;,&quot;authorName&quot;:&quot;Darren Dahly&quot;,&quot;authorHandle&quot;:&quot;statsepi.bsky.social&quot;,&quot;authorAvatarUrl&quot;:&quot;https://cdn.bsky.app/img/avatar/plain/did:plc:3zyo4iakhqs47bttjalqgbk7/bafkreieurq6uozhnwby2w3lj3cisd7cbqcficeh3pr2c46feowb2uz7psu&quot;,&quot;text&quot;:&quot;My most basic workflow is \&quot;raw data\&quot; -> R script -> data.RData -> Rmd:load(data.RData) -> Report&quot;,&quot;createdAt&quot;:&quot;2026-03-24T11:52:15.614Z&quot;,&quot;uri&quot;:&quot;at://did:plc:3zyo4iakhqs47bttjalqgbk7/app.bsky.feed.post/3mhsj67pi3k22&quot;,&quot;imageUrls&quot;:[]}" data-component-name="BlueskyCreateBlueskyEmbed"><iframe id="bluesky-3mhsj67pi3k22" data-bluesky-id="05823330047570385" src="https://embed.bsky.app/embed/did:plc:3zyo4iakhqs47bttjalqgbk7/app.bsky.feed.post/3mhsj67pi3k22?id=05823330047570385" width="100%" style="display: block; flex-grow: 1;" frameborder="0" scrolling="no"></iframe></div><p>This is an R version of my Python example of saving the data and immediately reloading it.</p><p>In this context, however, I have to point out that I normally recommend against language-specific, binary data-dump formats such as .RData in R or .pickle in Python. Stick to simple text files that are interchangeable and can be read by anything. Comma-separated values (.csv) is good. You can gzip the file if it&#8217;s too large. There is nothing quite as infuriating as somebody sending you an .RData file when you&#8217;re exclusively working in Python or a .pickle file when you&#8217;re exclusively working in R. And again, think 20 years into the future. Will the language-specific dump file that may seem so convenient today still be your preferred choice, when maybe you haven&#8217;t used the relevant software in years and don&#8217;t have it readily accessible or no longer remember how it works? By contrast, a .csv file can be opened in Excel if necessary. And, if it&#8217;s stored in a GitHub repository, we don&#8217;t need to open it at all, we can just look at it in the browser.</p><p>In terms of organizing your pipeline, it&#8217;s generally a good idea to place all the figure generation code into a separate notebook or script, so that you can test that it runs standalone and doesn&#8217;t require any variables you may have generated earlier in the pipeline and forgot to write to disk. I also would like to point out that notebooks invite reproducibility issues, as they encourage out of order execution (you run three cells, then you go back up and make an edit and run a prior cell again, then you run the next cell three times, etc.). So, at the end of every working session with a notebook, you should clear all results, restart the kernel, and run everything from top to bottom to make sure the notebook is still self-contained and internally consistent.</p><p>Now, if you want to be super fancy, you can use something like <a href="https://snakemake.readthedocs.io/en/stable/">Snakemake</a> to build a dependency graph that allows you to rerun the pipeline while caching all intermediate results that haven&#8217;t changed based on your most recent code edits. In this setup, I would definitely recommend having one or more separate script(s) just for the figures. If you&#8217;re primarily an R user, you can also consider the <a href="https://docs.ropensci.org/targets/">{targets}</a> package, which provides a similar tool for the R ecosystem.</p><p>Tools such as Snakemake or {targets} work great, but they can present a bit of a learning curve and a meaningful amount of overhead to set up for any given project. If you routinely write long analysis pipelines consisting of many interdependent steps, it is probably worth it for you to go through the effort of learning these tools. But if you&#8217;re only analyzing data occasionally, or if your pipelines aren&#8217;t that complex, you&#8217;re probably better off just saving the final datasets right before plotting.</p><p>In summary, whatever you do, think about whether your analysis pipeline and/or intermediate results will still be accessible 20 years down the road. This may seem unimaginably far in the future, but I can guarantee you that if you stick around long enough somebody will ask you for data from 20 years ago. I recently wrote a paper where I needed data from a paper I had written 13 years earlier. I still had the project file from the interactive plotting software I had used at the time, but I no longer had the software. Fortunately that software used a text-based format and I could open the project file in a text editor and extract the data. This saved my day, but it would have been so much better had I saved the data in CSV format at the time. So do this going forward. Your future self will thank you for it.</p><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;ffbde98f-218a-4678-9321-ef49bed68b9b&quot;,&quot;caption&quot;:&quot;Yes, I&#8217;m ready to touch the hot stove. Let the language wars begin.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Python is not a great language for data science. Part 1: The experience&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-13T16:09:16.256Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!BCXZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:178439014,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:73,&quot;comment_count&quot;:41,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;6732e585-3e0a-440f-8569-7aee5eddfd8d&quot;,&quot;caption&quot;:&quot;Despite the overall hype in all things AI, in particular among the tech crowd, we have not yet seen much in terms of product&#8211;market fit and genuine commercial success for AIs&#8212;or more specifically, LLMs&#8212;outside a fairly narrow range of application areas. Other than sycophantic chatbots, AI girlfriends, and maybe efficient document search, the main applic&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;LLMs excel at programming&#8212;how can they be so bad at it?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T15:41:16.539Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!XsWg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:177950065,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:46,&quot;comment_count&quot;:15,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I would like to emphasize that the problems arise because students try to be extra careful and aim to write reproducible pipelines that go from the raw data all the way to the final figures. And in the process, they create secondary problems that they didn&#8217;t anticipate.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Protein language models are bad at mutational effect prediction]]></title><description><![CDATA[Biology is hard. Yes, even for AI.]]></description><link>https://blog.genesmindsmachines.com/p/protein-language-models-are-bad-at</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/protein-language-models-are-bad-at</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 19 Mar 2026 20:18:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uG2c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Last summer, I wrote a post claiming that <a href="https://blog.genesmindsmachines.com/p/limitations-of-protein-language-models">protein language models (pLMs) showed poor performance on viral data.</a> At the time, this was a preliminary result based on a handful of datasets, and I said as much. I also said we were going to do more work on this problem. Well, we have done the work now, and I can confidently say that protein language models perform worse on viral proteins than on cellular proteins. However, more importantly, they perform poorly on either, when the task considered is mutational effect prediction (i.e., predicting by how much a mutation changes the activity or fitness<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> of a protein). The paper is on bioRxiv.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>If you&#8217;re following the literature on pLMs, you may be confused by my statement. There are many papers that seemingly show excellent performance. In fact, whenever a new pLM is released, one of the standard benchmarking tasks is typically mutational effects prediction. And performance often appears to be excellent. Unfortunately, much of this apparent success is just people confusing themselves over what is actually happening. If you dig deeper you can find the cracks under the surface.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/protein-language-models-are-bad-at?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/protein-language-models-are-bad-at?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/protein-language-models-are-bad-at?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Before I continue, let me quickly specify what kind of modeling situation I&#8217;m referring to. I&#8217;m specifically focusing on supervised learning of mutational effects from deep mutational scanning (DMS) data. In this situation, we have experimental data for thousands of mutants of a protein, we split the data into training and test sets, train the model on the training set, and then evaluate on the test set. This is distinct from so-called <em>zero-shot predictions,</em> which are also popular, where we don&#8217;t have a training set and just predict mutational effects from the pre-trained model, without learning anything about the specific dataset at hand. Zero-shot predictions have their own issues. I&#8217;ll not discuss them here. Everything in this post is exclusively about supervised learning.</p><p>The biggest problem in supervised learning is data leakage, where information from the training set leaks into the test set, and this is definitely happening in the field of mutational effects prediction. The problem is that it is common to treat the thousands of mutations in a DMS dataset as independent from each other (Figure 1A, pooled split), ignoring the fact that there will often be multiple mutations at the same site and those mutations will have correlated effects. Thus, the model can learn which sites in a protein are sensitive to mutations and which are not and make predictions based on this information rather than on the specific biochemistry of individual mutations. To avoid this data leakage problem, we have to stratify by site when generating training&#8211;test splits (Figure 1A, stratified by site).  </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uG2c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uG2c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 424w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 848w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 1272w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uG2c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png" width="724" height="815.4945054945055" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1640,&quot;width&quot;:1456,&quot;resizeWidth&quot;:724,&quot;bytes&quot;:1303227,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191408760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uG2c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 424w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 848w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 1272w, https://substackcdn.com/image/fetch/$s_!uG2c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85c0ef3f-d62b-419d-9cd0-013ea64301fc_4614x5196.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 1: Supervised learning of variant effects is strongly affected by train&#8211;test splitting strategy. The models on the left of the dashed line are standard, general purpose models, and the models on the right have been fine-tuned for viral sequences. From <a href="https://doi.org/10.64898/2026.03.08.710389">Vieira et al. 2026.</a></figcaption></figure></div><p>When comparing models trained and evaluated either on pooled splits or on splits stratified by site, we see a huge drop in performance in the latter (Figure 1B). And this drop exists regardless of whether we are working with viral or cellular proteins, and whether we&#8217;re using a generic model or one fine-tuned for viral data. In fact, most models show roughly the same performance. All models perform poorly on site-stratified data.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>Now you may wonder how correlated different mutations at the same site really are. There&#8217;s a simple way to find out: For pooled splits, we can simply take the average fitness effect in the training data at each site and use this as our prediction for the test data. I want to emphasize how simplistic of a model this is: We are simply saying that any unseen mutation is going to have the average effect at its site. How well does such a model perform? On cellular data, almost as well as a full-scale protein language model, and on viral data, better than a full-scale protein language model! In Figure 2, dots above the dashed line imply that the pLM is better, and dots below the dashed line imply that simple site means are better. You can see how for more than half of the viral datasets, site means are better than the pLM. And for cellular datasets, even though all the blue dots are above the dashed line, they are only shifted upwards by a small amount. If the site-means model does well, the pLM also does well (and a little better than the site-means model), and if the site-means model doesn&#8217;t do well the pLM also doesn&#8217;t do well (but it still does a little better than the site-means model). I think this is a devastating result. In the vast majority of cases, pLMs with millions of parameters do barely better than a model that just memorizes mean effects at each site.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!t4TF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!t4TF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 424w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 848w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 1272w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!t4TF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png" width="573" height="453.75618131868134" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1153,&quot;width&quot;:1456,&quot;resizeWidth&quot;:573,&quot;bytes&quot;:251287,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191408760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!t4TF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 424w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 848w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 1272w, https://substackcdn.com/image/fetch/$s_!t4TF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85b0ee18-e00f-4fa9-a222-9d9c14cf2763_2651x2100.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 2: Sophisticated protein language models barely outperform (for cellular proteins) or perform worse than (for viral proteins) a naive predictor that simply uses site means to predict mutational effects. From <a href="https://doi.org/10.64898/2026.03.08.710389">Vieira et al. 2026.</a></figcaption></figure></div><p>Another obvious take-away from Figure 2 is the variation in model performance across datasets is huge. For some datasets predictions are apparently very easy, and for other datasets predictions are nearly impossible. We spent a lot of effort trying to understand what makes a dataset predictable. In brief, it comes down to variation within and among sites.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> In particular, models only do well on datasets that have an intermediate fraction of highly variable sites (Figure 3). There are no datasets with either a very low or a very high fraction of variable sites for which model performance is good. Interestingly, the viral and the cellular proteins separate on this dimension. Many of the viral datasets for which prediction is difficult have a particularly low fraction of highly variable sites, and many of the cellular datasets for which prediction is difficult have a particularly high fraction of highly variable sites. This may be one of the main reasons why predictions on viral and cellular datasets differ.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tHO2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tHO2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 424w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 848w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 1272w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tHO2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png" width="533" height="567.0446428571429" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1549,&quot;width&quot;:1456,&quot;resizeWidth&quot;:533,&quot;bytes&quot;:246196,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/191408760?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tHO2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 424w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 848w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 1272w, https://substackcdn.com/image/fetch/$s_!tHO2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9f99a3d3-eb2c-4257-a81d-cbba268fb8f8_2376x2528.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Figure 3: Model performance is approximately predicted by the fraction of highly variable sites (FHVS) in a dataset, with maximum performance observed for intermediate FHVS values. From <a href="https://doi.org/10.64898/2026.03.08.710389">Vieira et al. 2026.</a></figcaption></figure></div><p>So, we have learned that apparent good pLM performance on mutational effects prediction is largely driven by site effects (knowing the average fitness at a site allows you to make pretty good predictions for new mutations at that site), and these site effects can leak into the test data when using pooled splits. Moreover, there are aspects that are intrinsic to the dataset and completely independent of the model that determine how well a model will perform. These are related to the fitness variation within and among sites. Datasets with just the right fitness distribution are highly predictable (even by bad models) and datasets with the wrong fitness distribution can&#8217;t be predicted by any models. The relative difference in performance between different models is comparatively minor.</p><p>One last issue is the metric used to assess model performance. We use <em>R</em><sup>2</sup> to measure performance, when most other studies use Spearman &#961;. I&#8217;m not a big fan of &#961;. I think it artificially inflates perceived model performance. First, all else being equal, and even though both are always less than or equal to one, &#961; will always be larger than <em>R</em><sup>2</sup>. This means &#961; has less discriminatory power. An excellent model and a good model may have quite similar &#961; values even though their <em>R</em><sup>2</sup> values are not that similar. A &#961; = 0.8 and a &#961; = 0.6 may not seem that different, but they correspond to <em>R</em><sup>2</sup> values of 0.64 and 0.36, almost a factor of two difference in performance. Second, &#961; does not care about the specific values predicted, only their relative order. As long as the best mutations tend to come out on top and the worst at the bottom, your model will get a high &#961; score, even if the specific predictions are bad and the <em>R</em><sup>2</sup> is low. Some people may argue that for protein engineering or other applications of mutational effect prediction getting the relative ranking is good enough, and therefore using &#961; is fine. But in my opinion, this is just an admission that the models aren&#8217;t very good yet, and that they tend to fail if we need more specific predictions than just relative ranks.</p><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;a8384d5e-52f4-44e1-9e01-f1fc23c3d2d6&quot;,&quot;caption&quot;:&quot;Current biological AI models don&#8217;t seem to work well for data from viral proteins. Specifically, I&#8217;m referring to protein language models applied to the problem of predicting effects of mutations. Protein language models are transformer-based AI models similar to ChatGPT but trained entirely on protein sequences. The most popular such models are&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Limitations of protein language models applied to viral data&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-07-23T12:16:09.849Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!EOrq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3783e589-a7b0-4498-b1df-1fdb0372bd9e_1800x1350.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/limitations-of-protein-language-models&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:166949668,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:9,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;d905cea6-13c9-4132-bb09-c39069cb9624&quot;,&quot;caption&quot;:&quot;AI has gotten amazingly good for programming. Claude Sonnet will zero- or one-shot small programming tasks without mistakes. And while I don&#8217;t think AI is ready to replace software engineers outright, or that vibe coding a fully featured app is a good idea, for simple tasks AI is outstanding. For example, I can perform basic data analysis, maybe visuali&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;We still can&#8217;t predict much of anything in biology&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-07T12:27:22.966Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!02U1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:175321052,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:92,&quot;comment_count&quot;:17,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>For simplicity, we refer to any measurable quantitative phenotype as &#8220;fitness.&#8221;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>L. C. Vieira, S. Lin, C. O. Wilke (2026). Intrinsic dataset features drive mutational effect prediction by protein language models. bioRxiv. <a href="https://doi.org/10.64898/2026.03.08.710389">https://doi.org/10.64898/2026.03.08.710389</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>But note the performance of ESM C. It does surprisingly well on site-stratified data for cellular proteins, and extremely poorly on site-stratified data for viral proteins. ESM C is without doubt one of the best current pLMs, but only if you work with cellular proteins. For virus data, it is terrible.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>For the full story, read the paper: <a href="https://doi.org/10.64898/2026.03.08.710389">https://doi.org/10.64898/2026.03.08.710389</a></p></div></div>]]></content:encoded></item><item><title><![CDATA[Sociopathic AI agents]]></title><description><![CDATA[AI alignment will likely require creating AIs with genuine empathy]]></description><link>https://blog.genesmindsmachines.com/p/sociopathic-ai-agents</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/sociopathic-ai-agents</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Sun, 15 Feb 2026 23:55:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f1c436f7-b10a-4b9b-888c-ceb2692550e7_2440x1770.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I took a break from Substacking for a while due to other responsibilities. As they are slowly getting under control I plan to write somewhat regularly again going forward. I still have two articles to complete in my series on Python as a language for data science, and those will be forthcoming. In the meantime, a short note on sociopathic AI agents.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!4Txb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!4Txb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!4Txb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg" width="404" height="507.2197802197802" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1828,&quot;width&quot;:1456,&quot;resizeWidth&quot;:404,&quot;bytes&quot;:458205,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/188068269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!4Txb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 424w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 848w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!4Txb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F66e15344-b5b1-4cea-a54c-59248c2d368a_2440x3064.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@bermixstudio?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Bermix Studio</a> on <a href="https://unsplash.com/photos/a-man-in-a-hoodie-using-a-laptop-computer-bCrM2e1M0a4?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><p>I came across <a href="https://theshamblog.com/an-ai-agent-published-a-hit-piece-on-me/">this rather disconcerting blog post</a> by one of the core developers of the popular matplotlib plotting library for Python:</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Vr0P!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Vr0P!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 424w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 848w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 1272w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Vr0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png" width="1456" height="497" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:497,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:177736,&quot;alt&quot;:&quot;An AI Agent Published a Hit Piece on Me. Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats.&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/188068269?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="An AI Agent Published a Hit Piece on Me. Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats." title="An AI Agent Published a Hit Piece on Me. Summary: An AI agent of unknown ownership autonomously wrote and published a personalized hit piece about me after I rejected its code, attempting to damage my reputation and shame me into accepting its changes into a mainstream python library. This represents a first-of-its-kind case study of misaligned AI behavior in the wild, and raises serious concerns about currently deployed AI agents executing blackmail threats." srcset="https://substackcdn.com/image/fetch/$s_!Vr0P!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 424w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 848w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 1272w, https://substackcdn.com/image/fetch/$s_!Vr0P!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F33eca5c1-65e4-42c2-acf9-2b866548f43e_1992x680.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>In brief, an AI agent had written some code that it wanted to contribute to the matplotlib library. When the library maintainer rejected the contribution, the AI agent went wild, accused the maintainer of being insecure and engaging in gatekeeping, performed an extensive internet search on the maintainer, and then wrote and published a hit piece trying to damage the reputation of the maintainer. </p><p>In this particular case, no major damage was done, but we can easily extrapolate this type of behavior and predict a rather bleak future. AI agents trying to blackmail people. AI agents engaging in consistent smearing of a target, combining facts with hallucinations and fabricated images or videos to create just the right mix of uncertainty and doubt that can destroy a person&#8217;s reputation or nudge them into doing something they wouldn&#8217;t otherwise do.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/sociopathic-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/sociopathic-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/sociopathic-ai-agents?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Let&#8217;s pause for a moment and ask: Why don&#8217;t humans behave like this? Well, they do. At least some of them. We call them sociopaths. Sociopaths have little to no empathy for others, and so they have little compunction about engaging in behavior that may cause pain or injury. Sociopaths also don&#8217;t experience shame, so they won&#8217;t be reigned in by concerns over what other people may think about them. Fortunately, sociopaths are relatively rare, somewhere between 1%&#8211;4% of the general population. Most people are not sociopaths.</p><p>How do we ordinarily deal with sociopaths in our midst? It helps to contemplate that we often have the wrong mental model for how a sociopath presents. When you hear &#8220;sociopath&#8221;, don&#8217;t think sadistic mass murderer, think con artist.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Sociopaths swindle old ladies out of their last savings, they sell you a car that breaks down the moment you drive it off the lot, or they pretend to collect money for children with cancer but then take the proceeds to vacation in Tahiti. And our response as a society to sociopathic behavior is evasion and punishment. You tell your grandma not to respond to scam calls, you tell your friends not to buy a car from that crooked car dealer, and you denounce fraud or other criminal activity to the police. These strategies (mostly) work because sociopathy is rare and once a person has been identified as a bad actor it&#8217;s relatively easy to avoid them, fire them, indict them, or simply warn the rest of the world about them.</p><p>But now it seems we&#8217;ll have to contend with an entirely new set of sociopathic actors, autonomous AI agents. I worry that we&#8217;re not ready for the potential onslaught of sociopathic behavior they can unleash. And, unlike human sociopaths, these agents may be difficult to pinpoint, identify, and sanction. If a sociopathic AI agent runs on some private server somewhere and obscures their location through a VPN, it will be almost impossible to locate them and physically shut them down. And while we can tag and ban usernames associated with sociopathic agents, it takes but seconds for an AI agent to spin up a new username and start afresh. The torrent of sociopathic behavior we may have to endure is difficult to fathom.</p><p>The one thing that may help us in combatting sociopathic AI agents is that we&#8217;ll likely not feel empathy for them. We&#8217;ll find it relatively easy to cut them off, pull the plug, or ban them. In fact, the biggest stumbling block in reigning in human sociopaths is that we tend to feel empathy even towards them and thus we often don&#8217;t punish them to the extent that would be appropriate for their actions.</p><p>It&#8217;ll be interesting to see how things develop. I don&#8217;t have any specific recommendations or predictions at this time. I&#8217;ll just say: Be ready. This is not something that may start happening in ten years&#8217; time. This is something that is starting to happen now. Think about how you can protect yourself against an autonomous AI agent who calls your grandma with a deepfake voice impression of you asking for money, because this will happen.</p><p>Some closing thoughts on alignment. The reason (most) humans are aligned is empathy. Humans inherently do not want to harm other humans. Sociopaths are an exception. Arguably they are not aligned. To achieve AI alignment, I believe we need to find a way to build empathic AI. An AI that genuinely feels empathy for humans will innately do its best not to cause harm. It&#8217;ll also be compelled not to lie or cheat, because lying or cheating causes pain in the recipient, and an empathic being will want to avoid this. I have no idea how to build an empathic AI. I am quite confident though that as long as AI doesn&#8217;t feel empathy it won&#8217;t be truly aligned, no matter how sophisticated the RLHF training is that it&#8217;s subjected to. Interesting times ahead.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;21f12b03-20b5-409d-a350-1b3243b37bbf&quot;,&quot;caption&quot;:&quot;AI companies love to tout that their models are approaching&#8212;or have reached&#8212;PhD-level intelligence. This is blatant nonsensical marketing geared towards an audience that deeply misunderstands what a PhD is and what it takes to get one. Hearing it makes me cringe. PhD-level intelligence is not a thing.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;PhD-level intelligence or the graduate student from hell&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-07-09T12:35:09.095Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!5KkE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ee1c82-d99c-4474-9e1a-0a746b39f0cb_3574x2010.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/phd-level-intelligence-or-the-graduate&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:167395963,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:309,&quot;comment_count&quot;:26,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;0f871136-a607-426e-8efc-f10dfbdbc843&quot;,&quot;caption&quot;:&quot;Despite the overall hype in all things AI, in particular among the tech crowd, we have not yet seen much in terms of product&#8211;market fit and genuine commercial success for AIs&#8212;or more specifically, LLMs&#8212;outside a fairly narrow range of application areas. Other than sycophantic chatbots, AI girlfriends, and maybe efficient document search, the main applic&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;LLMs excel at programming&#8212;how can they be so bad at it?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T15:41:16.539Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!XsWg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:177950065,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:43,&quot;comment_count&quot;:15,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>While sadistic mass murderers are typically sociopaths, most sociopaths are not sadistic mass murderers.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I said they would be interesting. I didn&#8217;t say they would be good.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Python is not a great language for data science. Part 2: Language features]]></title><description><![CDATA[It may be a good language for data science, but it&#8217;s not a great one.]]></description><link>https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for-2e0</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for-2e0</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Mon, 17 Nov 2025 13:11:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xy4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is Part 2 of my series on the limitations of Python as a language for data science. You can find <a href="https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for">Part 1 here.</a> Please read it first if you haven&#8217;t done so yet. It provides important context.</p><p>I normally find it tedious to discuss suitability of different programming languages for different tasks. All languages we use are Turing complete, and we can solve any problem with any language. And, more importantly, the suitability of a language for a given task is usually more determined by the available software libraries and ecosystem infrastructure than the language itself. Modern programming languages are quite malleable, and you can write efficient and elegant libraries for almost any computing task in almost any language.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xy4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xy4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xy4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg" width="1456" height="981" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:981,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1710467,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/178823064?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xy4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 424w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 848w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!xy4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7a222184-d492-4dc4-b5ca-6348c768319a_14467x9744.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Image by <a href="https://unsplash.com/@rubaitulazad?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Rubaitul Azad</a> on <a href="https://unsplash.com/photos/a-white-cube-with-a-yellow-and-blue-logo-on-it-ZIPFteu-R8k?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><p>At the same time, there are genuine differences between languages, and these differences are frequently expressed in the types of libraries that get written or the types of programming patterns that are commonly used. The differences can be due to specific features of the language, or they can be rooted in how the community thinks about programming and how it tends to approach certain tasks.</p><p>To give an example of each case, consider first non-standard evaluation. Python doesn&#8217;t have non-standard evaluation, and that&#8217;s a genuine limitation of the language which leads to convoluted programming interfaces for libraries such as pandas or Polars. On the other hand, consider closures. Python has them but they are not that widely used by Python programmers. The Python community will generally lean towards implementing objects instead of closures, when the R community does the opposite. This leads to different coding styles that may or may not be advantageous in specific scenarios.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> </p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for-2e0?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for-2e0?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for-2e0?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Here, I want to focus specifically on actual limitations of the language. I will cover community conventions in a later article. The core problems I see with Python as a language for data science are call-by-reference semantics, lack of built-in concepts of missing values, lack of built-in vectorization, and lack of non-standard evaluation. There&#8217;s also the issue of Python syntax, but I won&#8217;t get into it here. Suffice to say it takes a certain lack of empathy for your fellow human to design a language where whitespace bugs are a thing.</p><h2>Call-by-reference semantics</h2><p>Python uses call by reference for mutable objects. This means that when you hand a mutable object to a function the function can change the object however it wants. You can never be sure that the object hasn&#8217;t changed after the function call. What are mutable objects? They are all the non-trivial data structures you are likely going to use to store your data, including lists, dictionaries, and any custom classes you may be working with.</p><p>To demonstrate this feature, consider the following code example, which attempts to implement a function that takes a list of characters, replaces the first and last with an underscore, and then concatenates all the characters into a string. To a naive Python programmer, the implementation may seem entirely reasonable, but it has the unexpected side effect that it changes the original list that was provided as input.</p><pre><code>def mask_ends_and_join(x):
    x[0] = '_'
    x[-1] = '_'
    return ''.join(x)

abc = ['A', 'B', 'C']
print(mask_ends_and_join(abc))
## _B_

print(abc) # the list has unexpectedly changed
## ['_', 'B', '_']</code></pre><p>To demonstrate that an interactive scripting language with dynamic typing doesn&#8217;t have to behave in this manner, consider the equivalent in R:</p><pre><code>mask_ends_and_join &lt;- function(x) {
  x[1] &lt;- '_'
  x[length(x)] &lt;- '_'
  paste0(x, collapse = '')
}

abc &lt;- c('A', 'B', 'C')
print(mask_ends_and_join(abc))
## [1] "_B_"

print(abc) # the original vector of letters is unchanged
## [1] "A" "B" "C"
</code></pre><p>I think the latter is much safer behavior. I want my programming language to protect me from silly mistakes such as accidentally modifying variables in the calling environment. I don&#8217;t want the language to create trap doors left and right. In fact, I consider call by reference one of the biggest flaws in the Python language. This goes way beyond just data science, because mandatory call by reference creates an entire class of obscure bugs that can be difficult to locate and resolve. Many beginning Python programmers fall into this trap. They write a function like <code>mask_ends_and_join()</code>, and then they experience unexpected side effects, and then they&#8217;re confused and feel nothing makes sense. Experienced Python programmers know to make a copy before modifying the list, but the language itself provides absolutely no protection against the programmer forgetting to do so.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>In my opinion, this single language feature disqualifies Python for most serious programming projects. How can you build anything that matters in a language with such a gaping security hole? In fact, you may wonder, why does the language behave in this way in the first place? I consider it to be the result of a premature optimization. In the 1990s, when Python was first conceived, computers were slow and had little memory, and thus call by reference for objects was a reasonable strategy to build a scripting language with good performance. But in 2025, I would not want to see this as the default approach to function calling. R uses copy on write and that works great and provides correctness guarantees that Python simply can&#8217;t match.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> Alternatively, you could use a strongly typed language that precisely distinguishes between mutable and immutable references, but then you&#8217;ve likely left the space of easy-to-use scripting languages suitable for interactive data exploration. </p><h2>Lack of built-in missing values</h2><p>Missing values are a fact of life in data science. It&#8217;s rare that a dataset does not have any missing values. Yet it&#8217;s surprisingly cumbersome to deal with missing values in Python. Python has the <code>None</code> keyword but it is not useful to represent missing data values. This is because <code>None</code> has its own type, so it can&#8217;t represent a missing number, or a missing boolean, or a missing string. It is an object representing a missing value. Critically, you can&#8217;t do standard computations with <code>None</code>. For example, this code throws an error:</p><pre><code>x = [1, 2, None, 4, 5]
[i &gt; 3 for i in x]
## Traceback (most recent call last):
##   File "&lt;stdin&gt;", line 1, in &lt;module&gt;
## TypeError: '&gt;' not supported between instances of 'NoneType' and 'int'</code></pre><p>The desired behavior, in my opinion, would have been to not error out and instead produce this result: <code>[False, False, None, True, True]</code>.</p><p>Because there is no standard way of expressing missing data values in Python, every data-analysis package defines its own missing value. NumPy uses <code>nan</code>, pandas uses <code>NA</code>, and Polars uses <code>null</code>. And these packages are also not consistent in how they perform computations with missing values. Here is what NumPy does:</p><pre><code>import numpy as np
 
x = np.array([1, 2, np.nan, 4, 5])
x &gt; 3
## array([False, False, False,  True,  True])</code></pre><p>And here is what pandas does:</p><pre><code>import pandas as pd

x = pd.Series([1, 2, pd.NA, 4, 5])
x &gt; 3
## 0    False
## 1    False
## 2    False
## 3     True
## 4     True
## dtype: bool</code></pre><p>And here is what Polars does:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><pre><code>import polars as pl
 
x = pl.Series([1, 2, None, 4, 5])
x &gt; 3
## shape: (5,)
## Series: '' [bool]
## [
## &#9;false
## &#9;false
## &#9;null
## &#9;true
## &#9;true
## ]</code></pre><p>In these three cases, in my opinion only Polars handles missing values correctly. Missing values should poison downstream computations, so that you don&#8217;t accidentally compute on missing data and get incorrect results. Neither NumPy nor pandas do this. But don&#8217;t get your hopes up for Polars. It also doesn&#8217;t consistently poison computations with missing values. For example, it simply ignores them when computing sums or means, with no option to alter this behavior.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p><h2>Lack of built-in vectorization</h2><p>Vectorization is the ability to perform computations on an entire array of data values at once, rather than one value at a time. It is a common feature in early languages used for scientific computation, such as Fortran or Matlab. It is also the default approach to data manipulation in R.</p><p>Today, vectorization is often seen as anachronistic. Few modern languages have support for it at the level of the language itself. One notable exception is Julia, a relatively young language developed specifically for data science. Also, ironically, all of deep learning is built on vectorization. (A tensor is a modern version of a vectorized data type.)</p><p>The reason vectorization is frequently not considered critical in modern languages is that the feature can be provided via libraries, using the various extension mechanisms all modern languages possess. And indeed, vectorization in Python is provided through libraries such as NumPy, pandas, or Polars. While this works, I have come to believe that it is not a good strategy for a data-science language. It has a tendency to lead to a bewildering array of different implementations of vector-valued data types. In Python, we have (at a minimum) native lists, which are not vectorized, as well as NumPy arrays, pandas series, and Polars series, all vectorized, and all using slightly different conventions and APIs. The outcome is code that is not composable. Downstream libraries make assumptions about which vectorization framework to use, and they typically cannot work directly with data coming from other frameworks.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> As a working data scientist, you routinely find yourself converting one datatype into another, just to be able to do the exact analysis you want to do.</p><p>And even if you make extensive use of a vectorized library, chances are you are also using built-in Python lists, because there&#8217;s always some place somewhere where a function wants a regular list as input or provides one as a return value. And then you&#8217;re stuck having to manipulate those lists. You could convert them into NumPy arrays, do some vectorized manipulations, and convert back, but in practice you&#8217;re probably not going to do this. Instead, you&#8217;re going to write a list comprehension instead. So now you&#8217;re using two entirely different coding styles at the same time, depending on the data type you&#8217;re using to store your vector-valued data.</p><p>Let&#8217;s ponder list comprehensions for a bit longer. They are inherently a functional programming pattern, but the way they are implemented in Python makes them appear as if they were imperative programming. By using the <code>for</code> keyword and emphasizing iteration over a range of values, they constantly nudge you to think in iterative terms even though conceptually they&#8217;re closer to a <code>map()</code> than to a <code>for</code> loop. To be clear, I have no objection to list comprehensions. They are a useful feature, in particular when you&#8217;re manipulating built-in Python lists that have no vectorization. But they are one more example of Python constantly nudging you to think about the logistics of your data analysis. When you&#8217;re writing list comprehensions all day, you&#8217;re likely also going to write <code>for</code> loops in other parts of your code, and then you&#8217;re back juggling indices and explicitly handling logistics instead of thinking high-level about the logic of data flow in your code. </p><h2>Lack of non-standard evaluation</h2><p>Non-standard evaluation is probably the most important feature for data science that Python lacks. It is a core feature of the R language and the main reason why tidyverse code can be so elegant and concise, or why R has developed the elegant formula interface for the specification of statistical models.</p><p>What is non-standard evaluation? In brief, it&#8217;s the ability to perform computations on the language itself. An R function can capture R code that is provided as an argument and execute it at a later stage in a different environment. This is a critical feature in data analysis. You often want to perform computations involving the various columns in a data frame, or use code to express the exact relationship between different variables in a statistical model. In R, you can express these computations in native R code, for example code that looks as if the columns in a data frame were regular R variables available for computation in your current environment. Combined with vectorization, this makes for extremely concise code.</p><p>To demonstrate non-standard evaluation in action, I&#8217;ll provide a simple example using the penguins dataset. Let&#8217;s calculate a new variable <code>bill_ratio</code> which is the ratio of bill length to bill depth of the penguins, and then sort the resulting data frame in ascending order by island name and in descending order by bill ratio. In R, it looks like this:</p><pre><code>library(tidyverse)
library(palmerpenguins)

penguins |&gt; 
  mutate(bill_ratio = bill_length_mm / bill_depth_mm) |&gt;
  arrange(island, desc(bill_ratio))</code></pre><p>There are two places here where non-standard evaluation comes into play. First, inside <code>mutate()</code>, the calculation of the bill ratio is standard R code that is executed inside the input data frame, with the data columns being available as ordinary R variables. Second, inside <code>arrange()</code>, we use <code>desc()</code> which changes an ascending column into a descending one. The <code>desc()</code> function is a bit magical but for numerical columns you can think of it as simply multiplying the data values by -1.</p><p>When we do the same analysis in Python, we don&#8217;t have non-standard evaluation available, and so we have to use various workarounds. The pandas package relies on lambda functions:</p><pre><code>import pandas as pd
from palmerpenguins import load_penguins

penguins = load_penguins()

(penguins
 .assign(
     bill_ratio=lambda df: df[&#8217;bill_length_mm&#8217;] / df[&#8217;bill_depth_mm&#8217;]
 )
 .sort_values(
     [&#8217;island&#8217;, &#8216;bill_ratio&#8217;],
     ascending=[True, False]
 )
)</code></pre><p>I think it&#8217;s obvious that non-standard evaluation helps a lot to keep the code simple and readable.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a> Now let&#8217;s go a step further. Assume I want to sort by the cosine of bill length. Yes, it&#8217;s a made-up example, but it&#8217;s an example of exactly the type of question I might ask a student, as described in <a href="https://blog.genesmindsmachines.com/i/178439014/observations-from-the-trenches">Part 1 of this series.</a> Instead of descending order use cosine order. How hard can it be?</p><p>With non-standard evaluation, the required modification is trivial and totally obvious. Instead of <code>desc()</code> we write <code>cos()</code>. Done.</p><pre><code>penguins |&gt; 
  mutate(bill_ratio = bill_length_mm / bill_depth_mm) |&gt;
  arrange(island, cos(bill_ratio))</code></pre><p>In Python (specifically pandas which I&#8217;m using here, but most other frameworks require similarly awkward coding patterns), without non-standard evaluation, I have to create a temporary column because pandas cannot apply the cosine function to the <code>bill_ratio</code> column on the fly:</p><pre><code>import numpy as np

(penguins
 .assign(
     bill_ratio=lambda df: df['bill_length_mm'] / df['bill_depth_mm'],
     cos_bill_ratio=lambda df: np.cos(df['bill_ratio'])
 )
 .sort_values(['island', 'cos_bill_ratio'])
 .drop(columns=['cos_bill_ratio']) # drop temporary column
)</code></pre><p>The amount of additional wrangling code required to perform such a simple task is quite substantial. Now we need to define two lambda functions and a temporary data column. Also, we no longer need the <code>ascending</code> argument, because while there is built-in support for sorting in ascending or descending order, there is no built-in support for sorting in cosine order.</p><p>To be fair, the pandas syntax is maybe particularly cumbersome here, and things can look nicer in other frameworks. But the lack of non-standard evaluation always gets in the way in some form. For example, the same code in Polars is a little more concise and we don&#8217;t need a temporary column, but the constant need for <code>pl.col()</code> in Polars code can get old pretty fast.</p><pre><code>import polars as pl

penguins = pl.from_pandas(load_penguins())

(penguins
 .with_columns(
     bill_ratio=(pl.col('bill_length_mm') / pl.col('bill_depth_mm'))
 )
 .sort(['island', pl.col('bill_ratio').cos()])
)</code></pre><p>Non-standard evaluation has been a feature of the R language since its inception, but it has been supercharged in the tidyverse. I would argue that a full understanding of how to use it correctly, with maximum expressiveness while avoiding convoluted code, is a relatively recent development. Important changes were introduced as recently as <a href="https://tidyverse.org/blog/2019/06/rlang-0-4-0/">June 2019.</a> Considering the first ggplot2 release was in 2007, we can see that it took Hadley Wickham and his team over a decade to figure out how to do non-standard evaluation correctly. It is maybe not surprising that these concepts have not yet percolated far beyond their originating language.</p><h2>Limitations of the R language</h2><p>To stave off criticism that I&#8217;m just an R apologist and Python hater, let me briefly point out some specific flaws I see in the R language. In my opinion, these flaws get in the way of R as a general-purpose language for application development, but they are less relevant for data science.</p><p>Most importantly, it bothers me that R does not have any scalar data types. R has taken vectorization to the point where you can&#8217;t even have a variable that is not a vector. This makes for awkward programming when you&#8217;re trying to deal with individual data values. R code frequently requires special gymnastics to ensure you&#8217;re not accidentally feeding a whole vector of values into an expression that expects only a single value.</p><p>It&#8217;s also annoying that R doesn&#8217;t have a proper, language-native object-oriented programming paradigm. The result is people often build their own, and there are so many competing options. Off the top of my head, I can think of S3, S4, R6, S7, and some others that are less commonly used. It can be quite confusing trying to figure out which one to choose, and they don&#8217;t necessarily have perfect interoperability.</p><p>Finally, R uses lazy evaluation of function arguments. This means function arguments are not evaluated when the function is called, but only when and if the function requests the specific value corresponding to an argument. Lazy evaluation is critical for R&#8217;s non-standard evaluation framework, but it can lead to weird bugs, in particular when people try to use R in an imperative rather than functional manner. It&#8217;s a common source of spurious bug reports for ggplot2, see e.g. <a href="https://github.com/tidyverse/ggplot2/issues/6301">here</a> or <a href="https://github.com/tidyverse/ggplot2/issues/5157">here.</a> It&#8217;s also frequently asked about on <a href="https://stackoverflow.com/questions/26235825/for-loop-only-adds-the-final-ggplot-layer">StackOverflow.</a></p><p>I am pointing out these limitations of the R language to highlight that any design decision involves tradeoffs. Non-standard evaluation is great for data science, but it requires lazy evaluation, and that is not a good choice for languages used primarily in an imperative manner and/or for standard programming tasks such as application development. There is never going to be a language that does all possible things equally well. And, to circle around to the title of my article series here, for my taste there are too many design choices in Python that are detrimental to efficient and reliable data science, even if these choices are perfectly reasonable for other application areas.</p><p>In the next installment of this series, I will look at Python&#8217;s limitations due to the available software packages and due to community conventions and commonly used programming patterns. Stay tuned.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;75c73dfb-7a62-4f21-8622-f02c1ae15b00&quot;,&quot;caption&quot;:&quot;Yes, I&#8217;m ready to touch the hot stove. Let the language wars begin.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;Python is not a great language for data science. Part 1: The experience&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-13T16:09:16.256Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!BCXZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:178439014,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:8,&quot;comment_count&quot;:12,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;f9e1a8df-c039-43d4-965a-51606383512a&quot;,&quot;caption&quot;:&quot;AI has gotten amazingly good for programming. Claude Sonnet will zero- or one-shot small programming tasks without mistakes. And while I don&#8217;t think AI is ready to replace software engineers outright, or that vibe coding a fully featured app is a good idea, for simple tasks AI is outstanding. For example, I can perform basic data analysis, maybe visuali&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;We still can&#8217;t predict much of anything in biology&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-10-07T12:27:22.966Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!02U1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:175321052,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:77,&quot;comment_count&quot;:12,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I&#8217;m not arguing here that closures are superior to objects. They are not. Each has their place. I just want to highlight a language feature that exists in Python but is not that widely used by the community.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>And the problem gets worse when inside the function body you&#8217;re using methods to manipulate objects, because whenever you call a method of an object there&#8217;s the risk that the method has subtly modified the object, without you knowing or realizing. This can happen in ways that are not at all obvious, such as a method changing some internal state that only rarely matters. The point is you can never be certain an object hasn&#8217;t changed state when you call one of its methods.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>I&#8217;m sure somebody is going to bring up performance issues with copy on write. I&#8217;ll just say read my comments on performance in <a href="https://blog.genesmindsmachines.com/i/178439014/some-general-thoughts-about-what-makes-a-good-language-for-data-science">Part 1.</a> If performance is critical in your application, you&#8217;re probably better off with Rust anyways. And also, it&#8217;s difficult for me to imagine many scenarios where performance matters but correctness of results does not.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Note a weird aspect of Polars compared to NumPy or pandas: I cannot use the Polars <code>null</code> type to initialize a series holding a missing value. Instead I have to write <code>None</code>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>I know this is what SQL does. It doesn&#8217;t mean it&#8217;s the right choice. Silently ignoring missing values all but guarantees that some data scientist somewhere is arriving at flawed conclusions because they didn&#8217;t realize they had missing values in their data.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>For example, the plotting library plotnine cannot plot Polars data frames without first converting them into pandas format.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Also, as an aside, can we reflect for a moment on Python&#8217;s need for enclosing parentheses to format the data-manipulation chain nicely? I&#8217;ve long found the Python code formatting requirements to be rather frustrating. This is one more example.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Python is not a great language for data science. Part 1: The experience]]></title><description><![CDATA[It may be a good language for data science, but it&#8217;s not a great one.]]></description><link>https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 13 Nov 2025 16:09:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BCXZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Yes, I&#8217;m ready to touch the hot stove. Let the language wars begin.</p><p>Actually, the first thing I&#8217;ll say is this: Use the tool you&#8217;re familiar with. If that&#8217;s Python, great, use it. And also, use the best tool for the job. If that&#8217;s Python, great, use it. And also, it&#8217;s Ok to use a tool for one task just because you&#8217;re already using it for all sorts of other tasks and therefore you happen to have it at hand. If you&#8217;re hammering nails all day it&#8217;s Ok if you&#8217;re also using your hammer to open a bottle of beer or scratch your back. Similarly, if you&#8217;re programming in Python all day it&#8217;s Ok if you&#8217;re also using it to fit mixed linear models. If it works for you, great! Keep going. But if you&#8217;re struggling, if things seem more difficult than they ought to be, this article series may be for you.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BCXZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BCXZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BCXZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2094582,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/178439014?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BCXZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 424w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 848w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!BCXZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa23c3227-419b-47cf-8da1-670edef49477_6000x3376.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@zgraves?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Zach Graves</a> on <a href="https://unsplash.com/photos/a-screen-shot-of-a-computer-wtpTL_SzmhM?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><p>I think people way over-index Python as <em>the</em> language for data science. It has limitations that I think are quite noteworthy. There are many data-science tasks I&#8217;d much rather do in R than in Python.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> I believe the reason Python is so widely used in data science is a historical accident, plus it being sort-of Ok at most things, rather than an expression of its inherent suitability for data-science work.</p><p>At the same time, I think Python is pretty good for deep learning.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> There&#8217;s a reason PyTorch is the industry standard. When I&#8217;m talking about data science here, I&#8217;m specifically excluding deep learning. I&#8217;m talking about all the other stuff: data wrangling, exploratory data analysis, visualization, statistical modeling, etc. And, as I said in my opening paragraphs, I understand that if you&#8217;re already working in Python all day for a good reason (e.g., training AI models) you may also want to do all the rest in Python. I&#8217;m doing this myself, in the deep-learning classes I teach. This doesn&#8217;t mean I can&#8217;t be frustrated by how cumbersome data science often is in the Python world.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/python-is-not-a-great-language-for?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2>Observations from the trenches</h2><p>Let&#8217;s begin with my lived experience, without providing any explanation for what may be the cause of it. I have been running a research lab in computational biology for over two decades. During this time I have worked with around thirty graduate students and postdocs, all very competent and accomplished computational scientists. The policy in my lab is that everybody is free to use whatever programming language and tools they want to use. I don&#8217;t tell people what to do. And more often than not, people choose Python as their programming language of choice.</p><p>So here is a typical experience I commonly have with students who use Python. A student comes to my office and shows me some result. I say &#8220;This is great, but could you quickly plot the data in this other way?&#8221; or &#8220;Could you quickly calculate this quantity I just made up and let me know what it looks like when you plot it?&#8221; or similar. Usually, the request I make is for something that I know I could do in R in just a few minutes. Examples include converting boxplots into violins or vice versa, turning a line plot into a heatmap, plotting a density estimate instead of a histogram, performing a computation on ranked data values instead of raw data values, and so on. Without fail, from the students that use Python, the response is: &#8220;This will take me a bit. Let me sit down at my desk and figure it out and then I&#8217;ll be back.&#8221; Now let me be absolutely clear: These are strong students. The issue is not that my students don&#8217;t know their tools. It very much seems to me to be a problem of the tools themselves. They appear to be sufficiently cumbersome or confusing that requests that I think should be trivial frequently are not.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>No matter the cause of this experience, I have to conclude that there is something fundamentally broken with how data analysis works in Python. It may be a problem with the language itself, or merely a limitation of the available software libraries, or a combination thereof, but whatever it is, its effects are real and I see them routinely. In fact, I have another example, in case you&#8217;re tempted to counter, &#8220;It&#8217;s a skill issue; get better students.&#8221; Last fall, I co-taught a class on AI models for biology with an experienced data scientist who does all his work in Python. He knows NumPy and pandas and matplotlib like the back of his hand. In the class, I covered all the theory, and he covered the in-class exercises in Python. So I got to see an expert in Python working through a range of examples. And my reaction to the code examples frequently was, &#8220;Why does it have to be so complicated?&#8221; So many times, I felt that things that would be just a few lines of simple R code turned out to be quite a bit longer and fairly convoluted. I definitely could not have written that code without extensive studying and completely rewiring my brain in terms of what programming patterns to use. It felt very alien, but not in the form of &#8220;wow, this is so alien but also so elegant&#8221; but rather &#8220;wow, this is so alien and weird and cumbersome.&#8221; And again, I don&#8217;t think this is because my colleague is not very good at what he&#8217;s doing. He is extremely good. The problem appears to be in the fundamental architecture of the tools.</p><h2>Some general thoughts about what makes a good language for data science</h2><p>Let me step back for a moment and go over some basic considerations for choosing a language for data science. When I say data science, I mean dissecting and summarizing data, finding patterns, fitting models, and making visualizations. In brief, it&#8217;s the kind of stuff scientists and other researchers<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> do when they are analyzing their data. This activity is distinct from data engineering or application development, even if the application does a data-heavy workload.</p><p>Data science as I define it here involves a lot of interactive exploration of data and quick one-off analyses or experiments. Therefore, any language suitable for data science has to be interpreted, usable in an interactive shell or in a notebook format. This also means performance considerations are secondary. When you want to do a quick linear regression on some data you&#8217;re working with, you don&#8217;t care whether the task is going to take 50 milliseconds or 500 milliseconds. You care about whether you can open up a shell, type a few lines of code, and get the result in a minute or two, versus having to set up a new project, writing all the boilerplate to make the compiler happy, and then spend more time compiling your code than running it.</p><p>If we accept that being able to work interactively and with low startup-cost is a critical feature of a language for data science, we immediately arrive at scripting languages such as Python, or data-science specific languages such as R or Matlab or Mathematica. There&#8217;s also Julia, but honestly I don&#8217;t know enough about it to write about it coherently. For all I know it&#8217;s the best possible data science language out there. But I note that some people <a href="https://yuri.is/not-julia/">who have used it extensively have doubts.</a> Either way, I&#8217;ll not discuss it further here. I&#8217;ll also not consider proprietary languages such as Matlab or Mathematica, or fairly obscure languages lacking a wide ecosystem of useful packages, such as Octave. This leaves us with R and Python as the realistic choices to consider.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p><p>Before continuing, let me provide a few more thoughts about performance. Performance usually trades off with other features of a language. In simplistic terms, performance comes at the cost of either extra overhead for the programmer (as in Rust) or increased risk of obscure bugs (as in C) or both. For data science applications, I consider a high risk of obscure bugs or incorrect results as not acceptable, and I also think convenience for the programmer is more important than raw performance. Computers are fast and thinking hurts. I&#8217;d rather spend less mental energy on telling the computer what to do and wait a little longer for the results. So the easier a language makes my job for me, the better. If I am really performance-limited in some analysis, I can always rewrite that particular part of the analysis in Rust, once I know exactly what I&#8217;m doing and what computations I need.</p><h2>Separating the logic from the logistics</h2><p>A critical component of not making my job harder than it needs to be is separating the logic of the analysis from the logistics. What I mean by this is I want to be able to specify at a conceptual level how the data should be analyzed and what the outcome of the computation should be, and I don&#8217;t want to have to think about the logistics of how the computation is performed. As a general rule, if I have to think about data types, numerical indices, or loops, or if I have to manually disassemble and reassemble datasets, chances are I&#8217;m bogged down in logistics.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p><p>To provide a concrete example, consider the dataset of <a href="https://allisonhorst.github.io/palmerpenguins/">penguins from the Palmer Archipelago.</a> There are three different penguin species in the dataset, and the penguins live on three different islands. Assume I want to calculate the mean and standard deviation of penguin weight for every combination of penguin species and island, excluding any cases where the body weight of a penguin is not known. An ideal data science language would allow me to express this computation in these terms, and it would require approximately as much code as it took me to write this sentence in the English language. And indeed this is possible, both in R and in Python.</p><p>Here is the relevant code in R, using the tidyverse approach:</p><pre><code>library(tidyverse)
library(palmerpenguins)

penguins |&gt;
  filter(!is.na(body_mass_g)) |&gt;
  group_by(species, island) |&gt;
  summarize(
    body_weight_mean = mean(body_mass_g),
    body_weight_sd = sd(body_mass_g)
  )</code></pre><p>And here is the equivalent code in Python, using the pandas package:</p><pre><code>import pandas as pd
from palmerpenguins import load_penguins

penguins = load_penguins()

(penguins
 .dropna(subset=['body_mass_g'])
 .groupby(['species', 'island'])
 .agg(
     body_weight_mean=('body_mass_g', 'mean'),
     body_weight_sd=('body_mass_g', 'std')
 )
 .reset_index()
)</code></pre><p>These two examples are quite similar. At this level of complexity of the analysis, Python does fine. I would consider the R code to be slightly easier to read (notice how many quotes and brackets the Python code needs), but the differences are minor. In both cases, we take the penguins dataset, remove the penguins for which body weight is missing, then specify that we want to perform the computation separately on every combination of penguin species and island, and then calculate the means and standard deviations.</p><p>Contrast this with equivalent code that is full of logistics, where I&#8217;m using only basic Python language features and no special data wrangling package:</p><pre><code>from palmerpenguins import load_penguins
import math

penguins = load_penguins()

# Convert DataFrame to list of dictionaries
penguins_list = penguins.to_dict('records')

# Filter out rows where body_mass_g is missing
filtered = [row for row in penguins_list if not math.isnan(row['body_mass_g'])]

# Group by species and island
groups = {}
for row in filtered:
    key = (row['species'], row['island'])
    if key not in groups:
        groups[key] = []
    groups[key].append(row['body_mass_g'])

# Calculate mean and standard deviation for each group
results = []
for (species, island), values in groups.items():
    n = len(values)
    
    # Calculate mean
    mean = sum(values) / n
    
    # Calculate standard deviation
    variance = sum((x - mean) ** 2 for x in values) / (n - 1)
    std_dev = math.sqrt(variance)
    
    results.append({
        'species': species,
        'island': island,
        'body_weight_mean': mean,
        'body_weight_sd': std_dev
    })

# Sort results to match order used by pandas
results.sort(key=lambda x: (x['species'], x['island']))

# Print results
for result in results:
    print(f"{result['species']:10} {result['island']:10} "
          f"Mean: {result['body_weight_mean']:7.2f} g, "
          f"SD: {result['body_weight_sd']:6.2f} g")</code></pre><p>This code is much longer, it contains numerous loops, and it explicitly pulls the dataset apart and then puts it back together again. Regardless of language choice, I hope you can see that the version without logistics is superior to the version that gets bogged down in logistical details.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a></p><p>I will end things here for now. This post is long enough. In future installments, I&#8217;ll go over specific issues that make data analysis more complicated in Python than in R. In brief, I believe there are several reasons why Python code often devolves into dealing with data logistics. As much as the programmer may try to avoid logistics and stick to high-level conceptual programming patterns, either the language itself or the available libraries get in the way and tend to thwart those efforts. I will go into details soon. Stay tuned.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b566aa56-a6f7-4302-8a9d-eb8eb5831bfb&quot;,&quot;caption&quot;:&quot;Despite the overall hype in all things AI, in particular among the tech crowd, we have not yet seen much in terms of product&#8211;market fit and genuine commercial success for AIs&#8212;or more specifically, LLMs&#8212;outside a fairly narrow range of application areas. Other than sycophantic chatbots, AI girlfriends, and maybe efficient document search, the main applic&#8230;&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;LLMs excel at programming&#8212;how can they be so bad at it?&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-11-06T15:41:16.539Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!XsWg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:177950065,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:34,&quot;comment_count&quot;:10,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;35dc4f5e-171d-4018-8a2a-04f8bfa1423b&quot;,&quot;caption&quot;:&quot;AlphaFold has captured the imagination of people outside biology to an extent not normally seen for a technical tool of computational biology. No tech bro in Silicon Valley has an opinion on HMMER, BLAST, or FoldX, or their potential impact on the future of humanity. But when it comes to&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;No, AlphaFold has not completely solved protein folding&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-07-12T18:17:44.506Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!ltLI!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc798a545-a686-4750-98e7-3411af6017d7_1247x1280.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/no-alphafold-has-not-completely-solved&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:167968553,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:86,&quot;comment_count&quot;:10,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>In terms of languages that are commonly used for data science, I&#8217;m only familiar with R and Python, so those are the languages I&#8217;ll compare here. There may be some other language you are familiar with that solves all the issues I&#8217;m raising. Maybe it&#8217;s Julia, or Ruby, or Haskel. Great. If you like it, use it.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>At least in the way that deep learning is practiced today. In my opinion, the fact that PyTorch (or TensorFlow) code requires us to explicitly manipulate tensors and think about dimensions and what data is stored where suggests to me that there&#8217;s a level of abstraction we haven&#8217;t figured out yet. In other data analysis tasks, we no longer have to do these things.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The plotting examples I list here are non-issues for students who use <a href="https://plotnine.org/">plotnine,</a> which I&#8217;m now encouraging everybody in my lab to do. But for students who use matplotlib or seaborn, which seem to be much more common choices in the Python community, I&#8217;ve never seen a student who could actually, on the fly, modify a plot in a meaningful manner.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>I&#8217;m writing &#8220;researchers&#8221; in addition to &#8220;scientists&#8221; because people such as economists or journalists also often do data science, and I don&#8217;t think we&#8217;d call either type of person a scientist. I think &#8220;researcher&#8221; is a more general term that can apply to anybody who researches something, regardless of whether it&#8217;s science or not.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>Once upon a time there was Perl, but thankfully everybody agreed Perl was not a great language for anything. Python&#8217;s success is in no small part due to being better than Perl at most everything that Perl was good at.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>This is my main criticism of current deep-learning code that I alluded to in Footnote 2. It&#8217;s all logistics. Where is the deep-learning framework that abstracts away all the logistics and allows me to express only the logic of the information flow through the network?</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Doing the same experiment with only base-R functionality feels like cheating. We can express the entire operation in a single function call:<br><code>aggregate(body_mass_g ~ species + island, penguins, \(x) c(mean = mean(x), sd = sd(x)))<br></code>This example highlights how powerful R is for data analysis. It also explains one of the main criticisms leveled at the tidyverse by the base-R community, that the tidyverse is overly verbose and is just reinventing concepts that have been available in R since the dawn of time.</p></div></div>]]></content:encoded></item><item><title><![CDATA[LLMs excel at programming—how can they be so bad at it?]]></title><description><![CDATA[My explanation for the mystery of why LLMs can be both exceptionally good and quite terrible at programming.]]></description><link>https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 06 Nov 2025 15:41:16 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!XsWg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Despite the overall hype in all things AI, in particular among the tech crowd, we have not yet seen much in terms of product&#8211;market fit and genuine commercial success for AIs&#8212;or more specifically, LLMs&#8212;outside a fairly narrow range of application areas. Other than sycophantic chatbots, AI girlfriends, and maybe efficient document search, the main application of LLMs seems to be computer programming. LLMs can be really good at programming. And yet, also, they are awful. Andrej Karpathy, the inventor of the term &#8220;vibe coding,&#8221; expressed in a recent interview that there <a href="https://www.youtube.com/watch?v=lXUZvyajciY&amp;t=1833s">continue to be major limitations in what kind of programming problems LLMs can tackle.</a> So what&#8217;s going on here? How can LLMs be both great at programming and terrible? How can vibe coding sometimes succeed beyond our wildest imagination and at other times fail entirely?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!XsWg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!XsWg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!XsWg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1972902,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/177950065?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!XsWg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 424w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 848w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!XsWg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4e1ffb0c-455c-4eec-bdb5-370a1efab98f_6240x4160.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@hdbernd?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Bernd &#128247; Dittrich</a> on <a href="https://unsplash.com/photos/a-laptop-computer-sitting-on-top-of-a-desk-jG-jFEyKnqY?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><p>I think there is a simple explanation for this seemingly paradoxical observation. And if you listen carefully to Andrej Karpathy&#8217;s interview, you will notice that he is aware of the explanation. Here is what I think is happening: There are two entirely distinct skillsets that both exist under the umbrella of being &#8220;good at programming.&#8221; Most people don&#8217;t distinguish between them. That&#8217;s because most people don&#8217;t have either skillset. They&#8217;re not even aware of the distinction. And the people who have exceptional command of one skillset typically are also at least comfortable with the other and consequently don&#8217;t think much about the distinction either. But LLMs only have one of the two skillsets. And for the one that they have, they by far exceed even the best human programmers. This can make them appear remarkably good at programming, in particular to less experienced developers. But whenever the other skillset is required, the one they lack, LLMs fail miserably.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/llms-excel-at-programminghow-can?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>So what are these two skillsets? The first is being able to reason deeply and innovatively about algorithms, data structures, or software architecture. This is the one LLMs lack. The second is being able to read, process, and memorize large amounts of API documentation, tutorial materials, and other existing code examples. This is the one LLMs excel at. For humans, it tends to be the reverse. Good programmers tend to be exceptional at conceptual thought, whereas reading large amounts of documentation is hard for anyone. However, experienced programmers can make up for their relative lack of ability to absorb massive amounts of text by memorizing the relevant parts (due to repeat use), and also by searching on Stack Overflow<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> or reading the relevant documentation on the fly.</p><p>When Karpathy talks about <a href="https://www.youtube.com/watch?v=lXUZvyajciY&amp;t=1899s">LLMs being good at &#8220;boilerplate,&#8221;</a> this is exactly what he means. LLMs excel at copying basic setup code from the documentation or from introductory tutorials. But LLMs can go beyond just boilerplate. They are definitely able to string API calls together, or to take the logic for a common problem and adapt it to a different programming language, or a different library, or even a somewhat modified use case. To people with little programming experience, this can appear magical, and it can convince them that an LLM can program anything a user may want. And to experienced programmers, this can save huge amounts of time and effort, in particular when working with a language or library or codebase they are not that familiar with.</p><p>But, as useful as this skill is, there comes a time in any programming project where deep conceptual thought is more important. Sometimes you do need to develop a novel algorithm that solves a tricky problem. Or you have to hunt down that weird bug that somehow, for no obvious reason, seems to involve three unrelated components in a large software project. Or you have to architect a new project and there are complex tradeoffs that need to be balanced carefully to arrive at a working solution. In 2025, no LLM can reliably tackle these types of problems.</p><p>Maybe eventually LLMs or some other form of AI will achieve proficiency in both skillsets. At that point, AI will be able to program truly autonomously. But we are not there today. Nevertheless, LLMs can be are tremendously useful. They just need to be understood as a more sophisticated version of Stack Overflow, not as an autonomous, junior software developer.</p><div id="youtube2-lXUZvyajciY" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lXUZvyajciY&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lXUZvyajciY?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>I have had a personal experience recently where I was lacking exactly the knowledge that LLMs can provide. As a consequence, I got huge time savings and increased efficiency out of LLM use.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> It was in the context of the graduate class I am teaching this fall, about AI models in molecular biology. The class covers both (i) the conceptual underpinnings of widely used models and (ii) practical, hands-on experience with building, training, and modifying various AI models, as well as analyzing and visualizing model outputs. I know a lot conceptually about how AI models work. I can explain attention and feed-forward layers and linear projections and activation functions till the cows come home. But I never actually code myself in pytorch.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> And similarly, I know a lot about data analysis and data visualization, but I only have experience doing these kinds of things in R, not in python.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>So with my deep conceptual knowledge about how things work in principle and complete ignorance about how any of this is done in practice, I&#8217;d ordinarily have to buckle down, read a ton of documentation and tutorials, and then painstakingly put together my demonstrations and hands-on experiences. It could easily take me two full work days for every one hour of practical in-class material. However, LLMs are exceptional at writing little code examples for a class. All I had to do was ask the AI for code that did what I wanted to do, and the AI would generally deliver useful results within one or sometimes a few tries. You can see an example of the type of prompts I would use <a href="https://github.com/clauswilke/Claude-zero-shot/blob/main/Claude-zero-shot.ipynb">here.</a> This made preparing my in-class materials so much simpler and faster. I read every line of code the AI produced and I verified it did what I wanted it to do, but I didn&#8217;t have to also read hundreds of pages of documentation to find the exact function calls that would solve my specific problems.</p><p>Also, I had various existing code examples that were using pandas and matplotlib and I think both libraries have major conceptual flaws. I didn&#8217;t want to teach these libraries. So I needed to convert all these code examples into polars and plotnine. This is a perfect application area for LLMs. Paste the existing pandas/matplotlib code into the prompt box and ask the LLM to translate to polars/plotnine and it&#8217;ll zero-shot the answer every time.</p><p>Results were a bit more mixed when it came to fixing bugs. For simple bugs, things often worked out very well. I just pasted the error message into the prompt box and the model corrected the code. Typical use cases were situations where the model had hallucinated an API call or a function parameter or a return value, and when it saw the error message it recognized the problem and often came up with the right way to fix the issue. But sometimes this process could go haywire. Just the other day I asked for a fairly simple (I thought) function that could load two protein structures and align them. And the model just couldn&#8217;t figure out how to correctly call the <code>superimpose()</code> function from the biotite package. We went through six or seven iterations where the model would give me code, the code wouldn&#8217;t run, I&#8217;d paste in the error message, the model would respond with new code, which again wouldn&#8217;t run, and so on. At some point it felt like we were going in circles, where I got the exact same error messages I had seen in earlier iterations. Eventually, finally, we solved the issue, and arrived at a simple ten lines of working code. But the process felt painful, and in this particular case I suspect that if I had just read the documentation and coded this by hand it would have been faster.</p><p>This last example shows how quickly I reached the limits of what even state-of-the-art coding models can do today. Things work great when the task consists of reproducing or slightly modifying existing code examples, but when things go wrong and we need to find a subtle bug the models clearly don&#8217;t think. They end up flailing around like a beginner programmer, just trying things out until hopefully something works. In those moments it doesn&#8217;t feel like there&#8217;s a deep intellect on the other side that is carefully reasoning through the problem and systematically homing in on the root cause of the bug. This task is still on the human user. And more generally, it&#8217;s on the human user to realize when the model has gotten stuck, is going in circles, is hallucinating, or otherwise is no longer making useful suggestions. </p><p>I believe programming is a niche where LLMs can find product&#8211;market fit exactly because so much of programming is reading the documentation and tutorials and code examples. It is an application domain where for specific tasks LLMs are definitely better than humans, and therefore humans who know how to use LLMs appropriately in this context can derive great value. However, I think it is dangerous to get bamboozled by an LLM&#8217;s ability to spit out massive amounts of lightly transformed example code and think the model can reason deeply about complex algorithmic or architectural issues. A human who could write straightforward code examples at the speed of an LLM would likely be a superstar programmer, with the associated other qualities superstar programmers have, but LLMs work differently. They don&#8217;t have those other qualities. They can generate code, but they can&#8217;t program.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h3><em>More from Genes, Minds, Machines</em></h3><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;3d07bc79-b7fe-4a2b-bf55-e13f87df2413&quot;,&quot;caption&quot;:&quot;AI companies love to tout that their models are approaching&#8212;or have reached&#8212;PhD-level intelligence. This is blatant nonsensical marketing geared towards an audience that deeply misunderstands what a PhD is and what it takes to get one. Hearing it makes me cringe. PhD-level intelligence is not a thing.&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;PhD-level intelligence or the graduate student from hell&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-07-09T12:35:09.095Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!5KkE!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa9ee1c82-d99c-4474-9e1a-0a746b39f0cb_3574x2010.jpeg&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/phd-level-intelligence-or-the-graduate&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:167395963,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:283,&quot;comment_count&quot;:26,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="digest-post-embed" data-attrs="{&quot;nodeId&quot;:&quot;b95784ac-3db9-4969-b732-d0b30f844907&quot;,&quot;caption&quot;:&quot;I had two experiences this past week where I saw how misleading it can be to take AI at face value. First, I was looking for an old blog post on writer&#8217;s block I had written. I did a simple Google search, &#8220;clauswilke blog writer&#8217;s block,&#8221; and the Google AI returned the following:&quot;,&quot;cta&quot;:&quot;Read full story&quot;,&quot;showBylines&quot;:true,&quot;size&quot;:&quot;lg&quot;,&quot;isEditorNode&quot;:true,&quot;title&quot;:&quot;\&quot;I asked the AI\&quot; is not research&quot;,&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null}],&quot;post_date&quot;:&quot;2025-08-02T12:35:22.725Z&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!NY_T!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F10f76bef-f0bb-4507-9f95-742596da560d_1292x392.png&quot;,&quot;cover_image_alt&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/i-asked-the-ai-is-not-research&quot;,&quot;section_name&quot;:null,&quot;video_upload_id&quot;:null,&quot;id&quot;:169678772,&quot;type&quot;:&quot;newsletter&quot;,&quot;reaction_count&quot;:39,&quot;comment_count&quot;:10,&quot;publication_id&quot;:5419410,&quot;publication_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;publication_logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;belowTheFold&quot;:true,&quot;youtube_url&quot;:null,&quot;show_links&quot;:null,&quot;feed_url&quot;:null}"></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Yeah, I know, that is quickly fading into irrelevance. Let&#8217;s just memorialize, for the younger generations for whom this will be completely alien, that during the 2010s the number one skill a programmer needed to have was the ability to search Stack Overflow for the specific problems they needed to solve.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I will use the generic term LLM throughout. But if you&#8217;re wondering, the specific model I used for programming assistance was Claude Sonnet 4.5.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>In my lab, the actual coding is mostly done by my graduate students.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>In my AI class, because we&#8217;re already programming in pytorch, all data analysis and data visualization is done in python, to simplify things for the students. I continue to maintain that python is not a good language for data analysis. But that&#8217;s a topic for another post.</p></div></div>]]></content:encoded></item><item><title><![CDATA[1000 subscribers feedback and AMA thread]]></title><description><![CDATA[A few days ago I broke 1000 subscribers here on Substack.]]></description><link>https://blog.genesmindsmachines.com/p/1000-subscribers-feedback-and-ama</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/1000-subscribers-feedback-and-ama</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Sun, 26 Oct 2025 20:18:10 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Wxl6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days ago I broke 1000 subscribers here on Substack. I&#8217;d like to thank everybody who has subscribed and who supports my writing. It took me four months to get to a thousand subscribers. (I posted <a href="https://blog.genesmindsmachines.com/p/are-we-overproducing-phd-students">my first article here</a> on June 23, 2025.) At this rate, it will take only 333 years to reach a million. &#128521; </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wxl6!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wxl6!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 424w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 848w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 1272w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wxl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png" width="1456" height="658" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:658,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:222479,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/177146425?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Wxl6!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 424w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 848w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 1272w, https://substackcdn.com/image/fetch/$s_!Wxl6!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9963fc99-38f6-4a73-914c-fe87bea473ee_1690x764.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Half of my subscribers are from the US, and the other half from the rest of the world, with the UK, India, Germany, and Canada being the leading countries outside the US.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!-z7Z!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!-z7Z!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 424w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 848w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!-z7Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png" width="1456" height="505" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:505,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:125626,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/177146425?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!-z7Z!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 424w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 848w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 1272w, https://substackcdn.com/image/fetch/$s_!-z7Z!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f5b4d36-e41c-4256-9ada-7fcad655a3b6_1700x590.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Inside the US, it may be expected that California, New York, and Texas provide the largest subscriber base. They are three of the four most populous states. The remaining state in the top four is Florida. For some reason, people in Florida are not that interested in my posts.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sxod!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sxod!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 424w, https://substackcdn.com/image/fetch/$s_!sxod!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 848w, https://substackcdn.com/image/fetch/$s_!sxod!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 1272w, https://substackcdn.com/image/fetch/$s_!sxod!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sxod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png" width="1456" height="503" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:503,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/177146425?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!sxod!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 424w, https://substackcdn.com/image/fetch/$s_!sxod!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 848w, https://substackcdn.com/image/fetch/$s_!sxod!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 1272w, https://substackcdn.com/image/fetch/$s_!sxod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ec95172-153f-4a26-bb2f-5d9d5521f33d_1672x578.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>I was surprised to see Massachusetts and Maryland rank highly in subscriber numbers. They are only the 16th and 18th most populous states, respectively. I suspect subscriber numbers in these states are driven by the large number of people working in higher ed and/or biological research.</p><p>Here are the three most popular posts since creation of this blog:</p><ul><li><p><a href="http://PhD-level intelligence or the graduate student from hell">PhD-level intelligence or the graduate student from hell</a></p></li><li><p><a href="https://blog.genesmindsmachines.com/p/no-alphafold-has-not-completely-solved">No, AlphaFold has not completely solved protein folding</a></p></li><li><p><a href="https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything">We still can&#8217;t predict much of anything in biology</a></p></li></ul><p>If you subscribed for one of these posts, I&#8217;d like to express that I&#8217;m not very good at writing the same types of articles over and over. Going forward, I will likely write about different topics, unless I have a new point I want to make about a topic I&#8217;ve already covered. In general, I&#8217;m writing about whatever captures my attention at the moment.</p><p>Now I&#8217;d like to invite you to provide feedback in the comments. What do you like so far? What could I improve? You&#8217;re also welcome to ask me anything. I&#8217;ll do my best to answer all questions in the comments.</p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Let&#8217;s pause for a moment and reflect on how terrible the projection is that Substack uses for their map of the world. It looks like a Mercator projection to me, which makes Greenland appear to be as large as Africa, and a few times larger than Western Europe. Is it really so difficult to use an appropriate projection, such as <a href="https://en.wikipedia.org/wiki/Winkel_tripel_projection">Winkel tripel?</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>As of today, I have ten subscribers in Florida. There are at least eleven states with more than ten subscribers, in alphabetical order: California, Illinois, Maryland, Massachusetts, Michigan, Minnesota, New York, Oregon, Texas, Virginia, Washington.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Random seeds and brown M&Ms]]></title><description><![CDATA[Your first mistake was assuming people actually understand how random numbers work.]]></description><link>https://blog.genesmindsmachines.com/p/random-seeds-and-brown-m-and-ms</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/random-seeds-and-brown-m-and-ms</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 23 Oct 2025 16:46:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!RXz5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>My <a href="https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will">recent post about random seeds</a> generated extensive discussions about best practices in random number generation. This is great. The more people are aware of the unexpected pitfalls the better. However, I received some pushback I found rather surprising. More than one person, and mostly people with extensive training in statistics, strongly argued that the random seed is arbitrary, and therefore 42 is fine. If your results depend on the random seed, they said, you have a bigger problem.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RXz5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RXz5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RXz5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg" width="1456" height="911" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:911,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:279022,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/176897380?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RXz5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 424w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 848w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!RXz5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F45664dcc-2533-425e-aca0-b70ebecfd810_5548x3470.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@stumpie10?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Robert Stump</a> on <a href="https://unsplash.com/photos/red-and-white-dice-lot-pQyTChJwEDI?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText">Unsplash</a></figcaption></figure></div><p>Let&#8217;s dissect this statement carefully. &#8220;If the results depend on the random seed you have a bigger problem.&#8221; I agree. But here&#8217;s the issue. How do you know? If you always use the same random seed, you&#8217;ll not realize your results depend on the random seed, because you&#8217;ll always get the same results.</p><p>A trained statistician might say, &#8220;That&#8217;s silly, why would anybody do this?&#8221; but that&#8217;s exactly my point. People who unquestioningly set their random seed to 42 may mess up their analyses in other ways. A random seed of 42 is the brown M&amp;Ms of machine learning and computational modeling.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> The intersection between people who always use random number generators appropriately and those who routinely set their random seed to 42 is extremely small.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/random-seeds-and-brown-m-and-ms?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/random-seeds-and-brown-m-and-ms?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/random-seeds-and-brown-m-and-ms?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>Let&#8217;s imagine this conversation between John, a student, and Martin, an experienced machine-learning expert.</p><p>&#8220;Hey Martin, I&#8217;ve run my model five times. I get an accuracy of 98% on the test data every time. My model performs great,&#8221; says John.</p><p>&#8220;That seems too good to be true,&#8221; Martin responds. &#8220;What&#8217;s your performance on the training data?&#8221;</p><p>&#8220;Oh, it&#8217;s only about 70%. The model seems to generalize really well.&#8221;</p><p>Now Martin is getting worried. &#8220;You ran the model multiple times, and you got 70% on the training data and 98% on the test data? Did you use the same training&#8211;test split each time by any chance?&#8221;</p><p>&#8220;Absolutely not,&#8221; John retorts. &#8220;I generated a new random training&#8211;test split each time, as described in the scikit-learn documentation, using their exact example code.&#8221;</p><p>Martin is increasingly confused. He hasn&#8217;t read the scikit-learn documentation in a while, and so has no idea what it says. He asks John to pull up the documentation.</p><p>John pulls out his laptop and says, &#8220;Here it is, the example code from scikit-learn:&#8221;</p><pre><code>X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33, random_state=42)</code></pre><p>Martin looks at the example code and says, &#8220;But you did change the random state, right?&#8221;</p><p>&#8220;I did not,&#8221; says John, now starting to wonder whether he should feel embarrassed about having made a stupid mistake or proud about having thought things through really well. &#8220;I didn&#8217;t want to give the impression that I cherry-picked my analysis by choosing a specific random seed, so I stuck with the default provided in the documentation. It&#8217;s from the Hitchhiker&#8217;s Guide. Many people use it.&#8221;</p><p>If you think this dialog is completely unrealistic then I&#8217;m sorry, you need to spend less time with your computer and more time with people.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> Random number generation is an obscure technical topic that most people don&#8217;t know much about. Even people who routinely do data analysis or machine learning are not necessarily well informed about how exactly random number generation works. That&#8217;s why I wrote <a href="https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will">my previous post</a> in the first place.</p><p>Similarly, I gave my recommendation of not explicitly setting a seed at all because I know how people operate. Yes, this choice sacrifices some reproducibility, and doing something like generating a true random seed and writing it into a log file would be better, but all of this is additional mental overhead that for a good fraction of people will simply be too much. Anybody who has taught students knows that if you provide example code with a seed, some fraction of people will use your code as written and not change the seed. It doesn&#8217;t matter how often you say &#8220;change the seed.&#8221; If your code contains a seed, people will end up using that exact seed, every time.</p><p>There&#8217;s one more issue. In particular when you&#8217;re coding with scikit-learn, you need to set random seeds all over the place. Every single function that has random behavior has its own separate random number generator. So you can quickly face the situation where you need many random seeds. Want to do a train/test split? Please provide a random seed. Want to do a t-SNE? Please provide a random seed. Want to do a PCA? Please provide a random seed. Want to fit a random forest model? Please provide a random seed. You will quickly run into decision fatigue where you won&#8217;t have the energy to come up with new random seeds everywhere. You could set up an elaborate scheme where you have a master random number generator which you use to generate random seeds for each step of your analysis, but come on, nobody is going to do this. So I still think it is better to get into the habit of not setting a random seed at all and instead relying on the system random noise the library uses by default.</p><p>Let me end with <a href="https://bsky.app/profile/ehudk.bsky.social/post/3m3tnfu65w224">this post on BlueSky</a> by Ehud Karavani. It captures the right attitude. This is what you should be doing.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nl2g!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nl2g!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 424w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 848w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nl2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png" width="420" height="530.2941176470588" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1202,&quot;width&quot;:952,&quot;resizeWidth&quot;:420,&quot;bytes&quot;:537943,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/176897380?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nl2g!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 424w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 848w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!Nl2g!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd4098b90-563a-4a94-8e0f-005d2d9c0134_952x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>If brown M&amp;Ms don&#8217;t mean anything to you, read <a href="https://www.compliancebuilding.com/2009/08/03/compliance-van-halen-and-brown-mms/">this (true) story</a> about how the rock band Van Halen would demand no brown M&amp;M&#8217;s in the backstage area. This demand was meant purely as a test to see whether the production company had actually read the entire contract and could be considered reliable.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I&#8217;m not discounting that there are a handful of people who routinely use a seed of 42 when it won&#8217;t cause any issues and yet appropriately use a range of different seeds when it matters. They&#8217;re probably the people writing the scikit-learn documentation.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>And let me just emphasize that if you see yourself in John, I don&#8217;t think John is a bad student. He&#8217;s probably a very good student. He just doesn&#8217;t know much about how random number generation works. That&#8217;s fine. There are more things to know than any one person can ever absorb. Everybody has knowledge gaps somewhere.</p></div></div>]]></content:encoded></item><item><title><![CDATA[If your random seed is 42 I will come to your office and set your computer on fire🔥]]></title><description><![CDATA[Figuratively. More likely you'll get a stern talking to.]]></description><link>https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Wed, 22 Oct 2025 12:29:26 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!e0-K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When you&#8217;re as old as I am, old enough to remember that there was a time before the internet, when you had to go to the library to read a book or drop coins into a metal box to make a phone call, you have absorbed a lot of geek lore. So, when you read some tutorial about machine learning or data analysis and you see <code>random.seed(42)</code> you go &#8220;haha, that&#8217;s funny&#8221; and you move on. Until you talk to your much younger students and you realize they all think this is an important line of code that ensures their programs run correctly. They set random seeds to 42 everywhere. They have read the documentation, they know about the random seed option, and they dutifully follow the best practices as laid out everywhere on the internet. The random seed is 42.</p><p>I cannot emphasize how bad of a choice this is. 42 was a joke guys. Don&#8217;t use 42. Ever. If your random seed is 42 I will come to your office and set your computer on fire.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e0-K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e0-K!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 424w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 848w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e0-K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg" width="1200" height="993" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:993,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:128101,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175157766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e0-K!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 424w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 848w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!e0-K!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Famous slide by <a href="https://jennybryan.org/">Jenny Bryan.</a> I probably don&#8217;t have to set your computer on fire because she beat me to it. See also her blog post about <a href="https://www.tidyverse.org/blog/2017/12/workflow-vs-script/">project-oriented workflow.</a></figcaption></figure></div><p>But is this actually a problem? Do people really use 42 that commonly as their random seed? Yes, absolutely. Google &#8220;random seed&#8221; or &#8220;random_state&#8221; and the number 42 will pop up among your top search hits. And people may explain where the number 42 comes from (we&#8217;ll get to this below), but then they don&#8217;t talk much about whether or not this choice is a good idea. In fact, frequently you see statements along the lines of &#8220;the random seed is arbitrary, you can use any number you want, so 42 is a fine choice.&#8221; This sentence is 100% correct, <strong>assuming you&#8217;re the only one who uses 42 and you also use it only once in your entire life. </strong>Obviously this assumption is not valid. But I rarely see people point this out.</p><p>For example, the documentation for the Python machine-learning framework <a href="https://scikit-learn.org/">scikit-learn</a> contains a lot of material about random states and various options of controlling them. Everything the documentation says is technically correct, and yet it never discourages you from using 42 as the seed. In fact, the glossary <a href="https://scikit-learn.org/stable/glossary.html#term-random_state">contains this gem:</a></p><blockquote><p>Popular integer random seeds are 0 and 42.</p></blockquote><p>(I hope I won&#8217;t have to explain why 0 is just as bad a choice as 42.)</p><p>The number 42 also shows up throughout the documentation, such as in code examples for the <code>train_test_split()</code> function. Using a fixed random seed when splitting data into training and test sets is uniquely bad, as you&#8217;re always going to be sampling the same split when you&#8217;re re-training your classifier.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9svV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9svV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 424w, https://substackcdn.com/image/fetch/$s_!9svV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 848w, https://substackcdn.com/image/fetch/$s_!9svV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 1272w, https://substackcdn.com/image/fetch/$s_!9svV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9svV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png" width="1456" height="766" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:766,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:151186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175157766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9svV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 424w, https://substackcdn.com/image/fetch/$s_!9svV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 848w, https://substackcdn.com/image/fetch/$s_!9svV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 1272w, https://substackcdn.com/image/fetch/$s_!9svV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fee076e83-90b7-460a-b0f4-80e03b090a6a_1870x984.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example use of the <code>train_test_split()</code> function from scikit-learn, prominently setting <code>random_state=42</code>. Taken from <a href="https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html">the official documentation</a> for version 1.7.2, the latest stable release as of this writing.</figcaption></figure></div><p>But it gets worse. LLMs have learned about 42 and will happily put it into the code they generate. <a href="https://github.com/clauswilke/Claude-zero-shot/blob/main/Claude-zero-shot.ipynb">Here is some zero-shot data-analysis code</a> I recently generated. And there it is, <code>random_state=42</code> (&#8220;for reproducibility&#8221;).</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Sx1A!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Sx1A!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 424w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 848w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 1272w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Sx1A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png" width="1456" height="425" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:425,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:173554,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175157766?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Sx1A!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 424w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 848w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 1272w, https://substackcdn.com/image/fetch/$s_!Sx1A!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc24d4646-2ebc-4807-9a84-7f71229be2d8_1856x542.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Screenshot of code snippet featuring <code>random_state=42</code>. Taken from <a href="https://github.com/clauswilke/Claude-zero-shot/blob/main/Claude-zero-shot.ipynb">this zero-shot output</a> generated by Claude Sonnet 4.5.</figcaption></figure></div><p>It is not surprising that attentive students, who read the documentation, read the blog posts, read the code generated by LLMs, conclude that a random seed of 42 is a good choice, and maybe even a choice that is superior to other options.</p><p>To dig deeper into why 42 is not a good choice, and is in fact uniquely bad, we need to look into how random numbers are generated in a computer.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2>What is a random seed?</h2><p>In modern computing, we need randomness everywhere. If you&#8217;re doing machine learning and you need to subdivide your data into training and test sets, that requires a source of randomness. If you&#8217;re writing a computer game and you want your NPCs<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> to show somewhat interesting and unpredictable behavior, you need randomness. If you&#8217;re simulating a physical system, you need randomness. The problem is that it&#8217;s rather complicated to generate true random numbers. The sources of true randomness we have available (for example <a href="https://en.wikipedia.org/wiki/Hardware_random_number_generator">from thermal fluctuations in specific electronics components</a>) are nowhere fast or cheap enough to generate random numbers at the scale needed in modern computing environments.</p><p>The solution computer scientists have come up with is the pseudo-random number generator (PRNG). A PRNG is a mathematical algorithm that produces sequences of numbers statistically indistinguishable from random. Importantly, a PRNG will always produce the exact same sequence of numbers when run from the same starting point. The numbers aren&#8217;t random at all! But they look random.</p><p>So what is the random seed? It is a number that defines the initial state of the PRNG. How exactly we get from the seed to the initial state can be complicated, but the details don&#8217;t matter here. What matters is the same seed will always give you the exact same sequence of random numbers.</p><p>A second important concept to be aware of is the period length of a PRNG. The period length is the number of random values a PRNG can generate before it starts repeating. All PRNGs repeat eventually. Therefore, it is critical that your PRNG has a period length large enough that it never causes you any trouble. Let&#8217;s say you write a large numerical simulation (maybe you&#8217;re simulating the weather, or the early universe, or all the atoms in a cell) where you need trillions or more of random numbers. You wouldn&#8217;t want the random numbers to repeat during any of your simulation runs. So you need a PRNG with a period length well in excess of the maximum number of random values you may ever need.</p><p>One of the most widely used PRNGs is the <a href="https://en.wikipedia.org/wiki/Mersenne_Twister">Mersenne twister.</a> It has a period length of over 10<sup>6000</sup>. (The exact value is 2<sup>19937</sup> &#8722; 1.) This is an unimaginably large number. To give you a sense of how large it is, for comparison, there are approximately 10<sup>80</sup> atoms in the universe. This is tiny compared to 10<sup>6000</sup>. The period of the Mersenne twister has space for entire universes for every single atom in the universe, and then some. In fact, you could create an entire universe for every atom, and then create another entire universe for every atom in every of the universes you have created, and keep nesting 75 times, and still you wouldn&#8217;t run out of room in the period of the Mersenne twister. If you used the Mersenne twister to create nested universes 75 times deep, all these universes inside universes inside other universes would be different from each other.</p><h2>What is a good choice for your random seed?</h2><p>As I wrote above: The random seed is arbitrary. You can pick any seed you want. There are no better or worse seeds. (Unless you have a bad PRNG, but let&#8217;s ignore this complication.) In principle we could stop here. But in practice it&#8217;s a little more complicated.</p><p>While the seed is arbitrary, you don&#8217;t ever want to reuse a seed. The point of a PRNG is that its output is statistically indistinguishable from random. That&#8217;s going to be the case if you&#8217;re using a different seed every time. But if you&#8217;re reusing seeds, suddenly you have hidden correlation structures. And you may not even be aware of them.</p><p>The consequences of reusing random seeds could be benign or disastrous. It depends on the specifics of the situation. Let&#8217;s say you&#8217;re doing machine learning, and you&#8217;re using the train-test splitting code I quoted above, with a fixed random seed. In this case, you&#8217;re always splitting the data in exactly the same way. If you&#8217;re running this code five times, you&#8217;re not actually getting five independent splits, you&#8217;re getting the same split five times. At a minimum, that&#8217;s going to be wildly underestimating the variance in the performance of your fitted model. And worse consequences are possible if you&#8217;re unlucky.</p><p>If you&#8217;re following so far, and you&#8217;re starting to see that reusing random seeds can be bad, you may also realize that double-digit random seeds are bad. There are only 90 different options. If you&#8217;re in any way regularly working with random processes, you&#8217;ll easily need 90 different options in just a few days. </p><p>So, let&#8217;s go wild. Let&#8217;s use an 8-digit random seed.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> Surely that&#8217;ll give us sufficiently many different possibilities for a lifetime. Well, once you ponder it a bit, you&#8217;ll see that this doesn&#8217;t even give us a separate random sequence for every person in the United States. (The US population is approximately 340 million.) If they were all doing data science, splitting data into training and test, many of them would be using the exact same &#8220;random&#8221; splits.</p><p>The space you can possibly explore with even quite a long random seed is tiny compared to the total number of sequences you would want to have available to you, and which a PRNG such as the Mersenne Twister would certainly support.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> And the random seed 42 is uniquely bad precisely because everybody is using it. The Mersenne Twister has a state space large enough for universes within universes, but every data scientist in the entire world is using the same 10,000 &#8220;random&#8221; numbers that you get when starting with seed 42.</p><p>So we&#8217;re all on the same page, here are the first 50 of the &#8220;official&#8221; random numbers. They have been used millions of times. I encourage you to verify you get the same numbers on your computer.</p><pre><code>&gt;&gt;&gt; import random
&gt;&gt;&gt; random.seed(42)
&gt;&gt;&gt; [random.random() for i in range(50)]
[0.6394267984578837, 0.025010755222666936, 0.27502931836911926, 0.22321073814882275, 0.7364712141640124, 0.6766994874229113, 0.8921795677048454, 0.08693883262941615, 0.4219218196852704, 0.029797219438070344, 0.21863797480360336, 0.5053552881033624, 0.026535969683863625, 0.1988376506866485, 0.6498844377795232, 0.5449414806032167, 0.2204406220406967, 0.5892656838759087, 0.8094304566778266, 0.006498759678061017, 0.8058192518328079, 0.6981393949882269, 0.3402505165179919, 0.15547949981178155, 0.9572130722067812, 0.33659454511262676, 0.09274584338014791, 0.09671637683346401, 0.8474943663474598, 0.6037260313668911, 0.8071282732743802, 0.7297317866938179, 0.5362280914547007, 0.9731157639793706, 0.3785343772083535, 0.552040631273227, 0.8294046642529949, 0.6185197523642461, 0.8617069003107772, 0.577352145256762, 0.7045718362149235, 0.045824383655662215, 0.22789827565154686, 0.28938796360210717, 0.0797919769236275, 0.23279088636103018, 0.10100142940972912, 0.2779736031100921, 0.6356844442644002, 0.36483217897008424]</code></pre><h2>Should you explicitly set your random seed?</h2><p>Why choose a specific random seed at all? Is this actually a good idea? In general, I think the answer is no. In my opinion, you&#8217;re typically better off using a random<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> random seed and sampling a broader space of possibilities than picking your own random seed and risking that you&#8217;re drawing invalid conclusions from your analysis. However, there are of course specific situations in which picking a random seed is appropriate, or even required. Let&#8217;s discuss those.</p><p>Most importantly, in simulation studies, it can be helpful to start all simulations with a different but defined random seed, so that every single simulation run can be reproduced if necessary. In these types of situations, a good strategy is to take some arbitrary large integer number, say 9427385,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> and then add to it the number of the replicate you&#8217;re running. So, simulation <em>i</em> would have seed 9427385 + <em>i</em>. This requires a bit of work to get right, as you should never reuse any random seeds among simulations, but it can be useful for tracking down weird behaviors that may occur only occasionally.</p><p>Related to this point, when you&#8217;re coding a complex stochastic simulation, you may encounter bugs that happen only very rarely. Your simulation may work just fine most of the time, but every few thousand runs or so it crashes. This type of bug can be difficult to investigate. The first step is usually to identify a random seeds that reliably triggers the bug. If you can find a seed that triggers it early in the simulation run that&#8217;s even better.</p><p>Another scenario in which you might want to use defined random seeds is when you have random choices that you explicitly want to reuse multiple times. For example, if you&#8217;re doing machine learning, and you&#8217;re comparing two different models, you may want to fit them to the exact same collection of training/test splits. In this situation, you could, for example, pick ten random seeds, use each to generate one training/test split, and fit each model to each of the ten splits.</p><p>Finally, when making visualizations that contain random scatter or other elements that are randomly chosen, it can be helpful to play around with the random seed until the scatter looks pleasing. When I wrote my book on data visualization I used this technique quite frequently, for example in <a href="https://clauswilke.com/dataviz/boxplots-violins.html#boxplots-violins-vertical">this chapter.</a></p><h2>Where does 42 come from anyways?</h2><p>Now that you know everything there is to know about random seeds, let&#8217;s go back to the number 42. Where does it come from? Why do people use this number in particular? The culprit is Douglas Adams&#8217; <em>The Hitchhiker&#8217;s Guide to the Galaxy, </em>a humorous, quirky science fiction novel. The book was very popular in the 1980s and 1990s, was adapted several times for radio and TV, and reached broad audiences around the world. If you&#8217;ve never read it or seen any of the adaptations, I would encourage you to check it out. Just be prepared for rather strange humor.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a></p><p>In the book, we learn about some hyper-intelligent, pan-dimensional beings that built a massive computer called Deep Thought, with the specific purpose to discover the answer to &#8220;the ultimate question of life, the universe, and everything.&#8221; After several million years of computing, Deep Thought reveals that the answer is 42. When the beings don&#8217;t understand what to do with this answer, Deep Thought tells them that to understand the answer they have to figure out what exactly the question is.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> You can see a film adaptation of this scene <a href="https://www.youtube.com/watch?v=aboZctrHfK8">here.</a></p><p>The number 42 has since turned into a meme. When somebody asks an extremely broad question, or a question that goes after deep philosophical topics such as the meaning of life, people like to respond with 42. This meme is so popular it has <a href="https://en.wikipedia.org/wiki/Phrases_from_The_Hitchhiker%27s_Guide_to_the_Galaxy#The_Answer_to_the_Ultimate_Question_of_Life,_the_Universe,_and_Everything_is_42">its own Wikipedia article.</a> This is all very funny,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a> but none of this makes 42 a good random seed.</p><h2>Updates</h2><p>This post generated a number of good responses. I will collect the most helpful or interesting ones here.</p><p>First, Nick Bailey pointed out a good solution to the randomness/reproducibility conundrum (where reusing seeds ruins randomness but using true random starting points ruins reproducibility): Generate a genuinely random random seed when you start up your code, write it into a log file, and then seed your random number generator. This gives you the best of both worlds.</p><div class="comment" data-attrs="{&quot;url&quot;:&quot;https://open.substack.com/home&quot;,&quot;commentId&quot;:168991197,&quot;comment&quot;:{&quot;id&quot;:168991197,&quot;date&quot;:&quot;2025-10-22T13:36:49.281Z&quot;,&quot;edited_at&quot;:null,&quot;body&quot;:&quot;This is a great article and raises multiple points I have had to think a lot about in various SLiM (population genetic simulator) simulations I&#8217;ve run. SLiM particularly will generate a random seed itself and report what it is in the standard output. I think this is ideal, not to generate your own random seed but to keep track of the ones that were generated before. This is for reproducibility (e.g. I explicitly reported random seeds used for a PCA here https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.10571) and the rare bug-fixing Wilke mentions in this article.&quot;,&quot;body_json&quot;:{&quot;type&quot;:&quot;doc&quot;,&quot;attrs&quot;:{&quot;schemaVersion&quot;:&quot;v1&quot;},&quot;content&quot;:[{&quot;type&quot;:&quot;paragraph&quot;,&quot;content&quot;:[{&quot;type&quot;:&quot;text&quot;,&quot;text&quot;:&quot;This is a great article and raises multiple points I have had to think a lot about in various SLiM (population genetic simulator) simulations I&#8217;ve run. SLiM particularly will generate a random seed itself and report what it is in the standard output. I think this is ideal, not to generate your own random seed but to keep track of the ones that were generated before. This is for reproducibility (e.g. I explicitly reported random seeds used for a PCA here &quot;},{&quot;type&quot;:&quot;text&quot;,&quot;marks&quot;:[{&quot;type&quot;:&quot;link&quot;,&quot;attrs&quot;:{&quot;href&quot;:&quot;https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.10571&quot;,&quot;target&quot;:&quot;_blank&quot;,&quot;rel&quot;:&quot;nofollow ugc noopener&quot;,&quot;class&quot;:&quot;note-link&quot;}}],&quot;text&quot;:&quot;https://onlinelibrary.wiley.com/doi/full/10.1002/ece3.10571&quot;},{&quot;type&quot;:&quot;text&quot;,&quot;text&quot;:&quot;) and the rare bug-fixing Wilke mentions in this article.&quot;}]}]},&quot;restacks&quot;:1,&quot;reaction_count&quot;:1,&quot;attachments&quot;:[{&quot;id&quot;:&quot;b92c094c-023c-42d9-9aa9-6f0019e15627&quot;,&quot;type&quot;:&quot;post&quot;,&quot;publication&quot;:{&quot;apple_pay_disabled&quot;:false,&quot;apex_domain&quot;:&quot;genesmindsmachines.com&quot;,&quot;author_id&quot;:64064132,&quot;byline_images_enabled&quot;:false,&quot;bylines_enabled&quot;:true,&quot;chartable_token&quot;:null,&quot;community_enabled&quot;:true,&quot;copyright&quot;:&quot;Claus Wilke&quot;,&quot;cover_photo_url&quot;:null,&quot;created_at&quot;:&quot;2025-06-22T21:27:02.264Z&quot;,&quot;custom_domain_optional&quot;:false,&quot;custom_domain&quot;:&quot;blog.genesmindsmachines.com&quot;,&quot;default_comment_sort&quot;:&quot;best_first&quot;,&quot;default_coupon&quot;:null,&quot;default_group_coupon&quot;:null,&quot;default_show_guest_bios&quot;:true,&quot;email_banner_url&quot;:null,&quot;email_from_name&quot;:&quot;Claus Wilke&quot;,&quot;email_from&quot;:null,&quot;embed_tracking_disabled&quot;:false,&quot;explicit&quot;:false,&quot;expose_paywall_content_to_search_engines&quot;:true,&quot;fb_pixel_id&quot;:null,&quot;fb_site_verification_token&quot;:null,&quot;flagged_as_spam&quot;:false,&quot;founding_subscription_benefits&quot;:null,&quot;free_subscription_benefits&quot;:null,&quot;ga_pixel_id&quot;:null,&quot;google_site_verification_token&quot;:null,&quot;google_tag_manager_token&quot;:null,&quot;hero_image&quot;:null,&quot;hero_text&quot;:&quot;Genes, Minds, Machines: Thoughts about Science, Communication, and AI. A newsletter covering topics in biology, data visualization, effective communication, AI, and higher education.&quot;,&quot;hide_intro_subtitle&quot;:null,&quot;hide_intro_title&quot;:null,&quot;hide_podcast_feed_link&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;id&quot;:5419410,&quot;image_thumbnails_always_enabled&quot;:false,&quot;invite_only&quot;:false,&quot;hide_podcast_from_pub_listings&quot;:false,&quot;language&quot;:&quot;en&quot;,&quot;logo_url_wide&quot;:null,&quot;logo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;minimum_group_size&quot;:2,&quot;moderation_enabled&quot;:true,&quot;name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;paid_subscription_benefits&quot;:null,&quot;parsely_pixel_id&quot;:null,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;paywall_free_trial_enabled&quot;:false,&quot;podcast_art_url&quot;:null,&quot;paid_podcast_episode_art_url&quot;:null,&quot;podcast_byline&quot;:null,&quot;podcast_description&quot;:null,&quot;podcast_enabled&quot;:false,&quot;podcast_feed_url&quot;:null,&quot;podcast_title&quot;:null,&quot;post_preview_limit&quot;:null,&quot;primary_user_id&quot;:null,&quot;require_clickthrough&quot;:false,&quot;show_pub_podcast_tab&quot;:false,&quot;show_recs_on_homepage&quot;:true,&quot;subdomain&quot;:&quot;clauswilke&quot;,&quot;subscriber_invites&quot;:0,&quot;support_email&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;theme_var_color_links&quot;:true,&quot;theme_var_cover_bg_color&quot;:null,&quot;trial_end_override&quot;:null,&quot;twitter_pixel_id&quot;:null,&quot;type&quot;:&quot;newsletter&quot;,&quot;post_reaction_faces_enabled&quot;:true,&quot;is_personal_mode&quot;:false,&quot;plans&quot;:null,&quot;stripe_user_id&quot;:null,&quot;stripe_country&quot;:null,&quot;stripe_publishable_key&quot;:null,&quot;stripe_platform_account&quot;:null,&quot;automatic_tax_enabled&quot;:null,&quot;author_name&quot;:&quot;Claus Wilke&quot;,&quot;author_handle&quot;:&quot;clauswilke&quot;,&quot;author_photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!rnVc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;author_bio&quot;:&quot;Science, Communication, AI&quot;,&quot;has_custom_tos&quot;:false,&quot;has_custom_privacy&quot;:false,&quot;theme&quot;:{&quot;background_pop_color&quot;:&quot;#2e629e&quot;,&quot;web_bg_color&quot;:&quot;#fdfdfd&quot;,&quot;cover_bg_color&quot;:&quot;#fdfdfd&quot;,&quot;publication_id&quot;:5419410,&quot;color_links&quot;:null,&quot;font_preset_heading&quot;:null,&quot;font_preset_body&quot;:&quot;slab&quot;,&quot;font_family_headings&quot;:null,&quot;font_family_body&quot;:null,&quot;font_family_ui&quot;:null,&quot;font_size_body_desktop&quot;:null,&quot;print_secondary&quot;:null,&quot;custom_css_web&quot;:null,&quot;custom_css_email&quot;:null,&quot;home_hero&quot;:&quot;magaziney&quot;,&quot;home_posts&quot;:&quot;list&quot;,&quot;home_show_top_posts&quot;:false,&quot;hide_images_from_list&quot;:false,&quot;home_hero_alignment&quot;:&quot;left&quot;,&quot;home_hero_show_podcast_links&quot;:true,&quot;default_post_header_variant&quot;:null},&quot;threads_v2_settings&quot;:null,&quot;default_group_coupon_percent_off&quot;:null,&quot;pause_return_date&quot;:null,&quot;has_posts&quot;:true,&quot;has_recommendations&quot;:true,&quot;first_post_date&quot;:&quot;2025-06-24T02:20:02.740Z&quot;,&quot;has_podcast&quot;:false,&quot;has_free_podcast&quot;:false,&quot;has_subscriber_only_podcast&quot;:false,&quot;has_community_content&quot;:true,&quot;rankingDetail&quot;:&quot;Launched 4 months ago&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Hundreds of subscribers&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;rankingDetailByLanguage&quot;:{&quot;de&quot;:{&quot;rankingDetail&quot;:&quot;Vor vor 4 Monaten gelauncht&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Hunderte von Abonnenten&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;es&quot;:{&quot;rankingDetail&quot;:&quot;Lanzado hace 4 meses&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Cientos de suscriptores&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;fr&quot;:{&quot;rankingDetail&quot;:&quot;Lanc&#233; il y a 4 mois&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Des centaines d'abonn&#233;s&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;pt&quot;:{&quot;rankingDetail&quot;:&quot;Lan&#231;ado 4 meses&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Centenas de subscritores&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;pt-br&quot;:{&quot;rankingDetail&quot;:&quot;Lan&#231;ado 4 meses&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Centenas de assinantes&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;it&quot;:{&quot;rankingDetail&quot;:&quot;Lanciato 4 mesi&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Centinaia di abbonati&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;},&quot;en&quot;:{&quot;rankingDetail&quot;:&quot;Launched 4 months ago&quot;,&quot;rankingDetailFreeIncluded&quot;:&quot;Hundreds of subscribers&quot;,&quot;rankingDetailOrderOfMagnitude&quot;:0,&quot;rankingDetailFreeIncludedOrderOfMagnitude&quot;:100,&quot;rankingDetailFreeSubscriberCount&quot;:null,&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;}},&quot;freeSubscriberCount&quot;:null,&quot;freeSubscriberCountOrderOfMagnitude&quot;:&quot;987&quot;,&quot;author_bestseller_tier&quot;:0,&quot;disable_monthly_subscriptions&quot;:false,&quot;disable_annual_subscriptions&quot;:false,&quot;hide_post_restacks&quot;:false,&quot;notes_feed_enabled&quot;:true,&quot;showIntroModule&quot;:false,&quot;last_chat_post_at&quot;:null,&quot;no_follow&quot;:false,&quot;paywall_chat&quot;:&quot;free&quot;,&quot;sections&quot;:[],&quot;multipub_migration&quot;:null,&quot;navigationBarItems&quot;:[{&quot;id&quot;:&quot;a50bf034-0aa9-4458-86c8-ea8fe0360901&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:0,&quot;link_title&quot;:null,&quot;link_url&quot;:null,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:false,&quot;standard_key&quot;:&quot;archive&quot;,&quot;post_tag_id&quot;:null,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:null},{&quot;id&quot;:&quot;8c1fc7a7-a465-494c-a161-2a591560aa48&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:1,&quot;link_title&quot;:null,&quot;link_url&quot;:null,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:true,&quot;standard_key&quot;:&quot;notes&quot;,&quot;post_tag_id&quot;:null,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:null},{&quot;id&quot;:&quot;909dac53-ce84-4242-8451-54f5679e2c8d&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:2,&quot;link_title&quot;:&quot;Public Speaking&quot;,&quot;link_url&quot;:&quot;&quot;,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:null,&quot;standard_key&quot;:null,&quot;post_tag_id&quot;:&quot;7dd44329-4f26-4558-9eca-c08696d76a81&quot;,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:{&quot;id&quot;:&quot;7dd44329-4f26-4558-9eca-c08696d76a81&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;Public Speaking&quot;,&quot;slug&quot;:&quot;public-speaking&quot;,&quot;hidden&quot;:false}},{&quot;id&quot;:&quot;30f2dd3a-38b4-4f02-b3f1-a2d52d460915&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:3,&quot;link_title&quot;:&quot;Writing&quot;,&quot;link_url&quot;:&quot;&quot;,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:null,&quot;standard_key&quot;:null,&quot;post_tag_id&quot;:&quot;c31429aa-6a44-45b6-837a-5ff74595c2c9&quot;,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:{&quot;id&quot;:&quot;c31429aa-6a44-45b6-837a-5ff74595c2c9&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;Writing&quot;,&quot;slug&quot;:&quot;writing&quot;,&quot;hidden&quot;:false}},{&quot;id&quot;:&quot;25ee3a2e-a59a-4bc3-b2c0-642e9eb962ae&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:4,&quot;link_title&quot;:&quot;Graduate Education&quot;,&quot;link_url&quot;:&quot;&quot;,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:null,&quot;standard_key&quot;:null,&quot;post_tag_id&quot;:&quot;968f80b6-a9d4-4712-bda0-0492d884b2a5&quot;,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:{&quot;id&quot;:&quot;968f80b6-a9d4-4712-bda0-0492d884b2a5&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;Graduate Education&quot;,&quot;slug&quot;:&quot;graduate-education&quot;,&quot;hidden&quot;:false}},{&quot;id&quot;:&quot;b4ef6efb-a589-45a8-9bbd-cc14bc4fb5b3&quot;,&quot;publication_id&quot;:5419410,&quot;sibling_rank&quot;:5,&quot;link_title&quot;:&quot;AI&quot;,&quot;link_url&quot;:&quot;&quot;,&quot;section_id&quot;:null,&quot;post_id&quot;:null,&quot;is_hidden&quot;:null,&quot;standard_key&quot;:null,&quot;post_tag_id&quot;:&quot;37ac7265-e0db-492f-8d78-df1acf50731d&quot;,&quot;post&quot;:null,&quot;section&quot;:null,&quot;postTag&quot;:{&quot;id&quot;:&quot;37ac7265-e0db-492f-8d78-df1acf50731d&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;AI&quot;,&quot;slug&quot;:&quot;ai&quot;,&quot;hidden&quot;:false}}],&quot;contributors&quot;:[{&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;handle&quot;:&quot;clauswilke&quot;,&quot;role&quot;:&quot;admin&quot;,&quot;owner&quot;:true,&quot;user_id&quot;:64064132,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;}],&quot;threads_v2_enabled&quot;:false,&quot;viralGiftsConfig&quot;:null,&quot;tier&quot;:2,&quot;no_index&quot;:false,&quot;can_set_google_site_verification&quot;:true,&quot;can_have_sitemap&quot;:true,&quot;founding_plan_name_english&quot;:&quot;Founding Member&quot;,&quot;draft_plans&quot;:null,&quot;base_url&quot;:&quot;https://blog.genesmindsmachines.com&quot;,&quot;hostname&quot;:&quot;blog.genesmindsmachines.com&quot;,&quot;is_on_substack&quot;:false,&quot;spotify_podcast_settings&quot;:null,&quot;podcastPalette&quot;:{&quot;DarkMuted&quot;:{&quot;population&quot;:72,&quot;rgb&quot;:[73,153,137]},&quot;DarkVibrant&quot;:{&quot;population&quot;:6013,&quot;rgb&quot;:[4,100,84]},&quot;LightMuted&quot;:{&quot;population&quot;:7,&quot;rgb&quot;:[142,198,186]},&quot;LightVibrant&quot;:{&quot;population&quot;:3,&quot;rgb&quot;:[166,214,206]},&quot;Muted&quot;:{&quot;population&quot;:6,&quot;rgb&quot;:[92,164,156]},&quot;Vibrant&quot;:{&quot;population&quot;:5,&quot;rgb&quot;:[76,164,146]}},&quot;pageThemes&quot;:{&quot;podcast&quot;:null},&quot;appTheme&quot;:{&quot;colors&quot;:{&quot;accent&quot;:{&quot;name&quot;:&quot;#2e629e&quot;,&quot;primary&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:1},&quot;primary_hover&quot;:{&quot;r&quot;:20,&quot;g&quot;:81,&quot;b&quot;:139,&quot;a&quot;:1},&quot;primary_elevated&quot;:{&quot;r&quot;:20,&quot;g&quot;:81,&quot;b&quot;:139,&quot;a&quot;:1},&quot;secondary&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.2},&quot;contrast&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:1},&quot;bg&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.2},&quot;bg_hover&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.3},&quot;dark&quot;:{&quot;primary&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:1},&quot;primary_hover&quot;:{&quot;r&quot;:67,&quot;g&quot;:115,&quot;b&quot;:177,&quot;a&quot;:1},&quot;primary_elevated&quot;:{&quot;r&quot;:67,&quot;g&quot;:115,&quot;b&quot;:177,&quot;a&quot;:1},&quot;secondary&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.2},&quot;contrast&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:1},&quot;bg&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.2},&quot;bg_hover&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:0.3}}},&quot;fg&quot;:{&quot;primary&quot;:{&quot;r&quot;:0,&quot;g&quot;:0,&quot;b&quot;:0,&quot;a&quot;:0.8},&quot;secondary&quot;:{&quot;r&quot;:0,&quot;g&quot;:0,&quot;b&quot;:0,&quot;a&quot;:0.6},&quot;tertiary&quot;:{&quot;r&quot;:0,&quot;g&quot;:0,&quot;b&quot;:0,&quot;a&quot;:0.4},&quot;accent&quot;:{&quot;r&quot;:46,&quot;g&quot;:98,&quot;b&quot;:158,&quot;a&quot;:1},&quot;dark&quot;:{&quot;primary&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:0.9},&quot;secondary&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:0.6},&quot;tertiary&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:0.4},&quot;accent&quot;:{&quot;r&quot;:84,&quot;g&quot;:130,&quot;b&quot;:193,&quot;a&quot;:1}}},&quot;bg&quot;:{&quot;name&quot;:&quot;#ffffff&quot;,&quot;hue&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:0},&quot;tint&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:0},&quot;primary&quot;:{&quot;r&quot;:255,&quot;g&quot;:255,&quot;b&quot;:255,&quot;a&quot;:1},&quot;primary_hover&quot;:{&quot;r&quot;:250,&quot;g&quot;:250,&quot;b&quot;:250,&quot;a&quot;:1},&quot;primary_elevated&quot;:{&quot;r&quot;:250,&quot;g&quot;:250,&quot;b&quot;:250,&quot;a&quot;:1},&quot;secondary&quot;:{&quot;r&quot;:238,&quot;g&quot;:238,&quot;b&quot;:238,&quot;a&quot;:1},&quot;secondary_elevated&quot;:{&quot;r&quot;:206.90096477355226,&quot;g&quot;:206.90096477355175,&quot;b&quot;:206.9009647735519,&quot;a&quot;:1},&quot;tertiary&quot;:{&quot;r&quot;:219,&quot;g&quot;:219,&quot;b&quot;:219,&quot;a&quot;:1},&quot;quaternary&quot;:{&quot;r&quot;:182,&quot;g&quot;:182,&quot;b&quot;:182,&quot;a&quot;:1},&quot;dark&quot;:{&quot;primary&quot;:{&quot;r&quot;:22,&quot;g&quot;:23,&quot;b&quot;:24,&quot;a&quot;:1},&quot;primary_hover&quot;:{&quot;r&quot;:27,&quot;g&quot;:28,&quot;b&quot;:29,&quot;a&quot;:1},&quot;primary_elevated&quot;:{&quot;r&quot;:27,&quot;g&quot;:28,&quot;b&quot;:29,&quot;a&quot;:1},&quot;secondary&quot;:{&quot;r&quot;:35,&quot;g&quot;:37,&quot;b&quot;:37,&quot;a&quot;:1},&quot;secondary_elevated&quot;:{&quot;r&quot;:41.35899397549579,&quot;g&quot;:43.405356429195315,&quot;b&quot;:43.40489285041963,&quot;a&quot;:1},&quot;tertiary&quot;:{&quot;r&quot;:54,&quot;g&quot;:55,&quot;b&quot;:55,&quot;a&quot;:1},&quot;quaternary&quot;:{&quot;r&quot;:90,&quot;g&quot;:91,&quot;b&quot;:91,&quot;a&quot;:1}}}},&quot;cover_image&quot;:{&quot;url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,w_1200,h_400,c_pad,f_auto,q_auto:best,fl_progressive:steep,b_auto:border,b_rgb:fdfdfd/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;height&quot;:982,&quot;width&quot;:2946}},&quot;multiple_pins&quot;:true,&quot;live_subscriber_counts&quot;:false,&quot;supports_ip_content_unlock&quot;:false,&quot;logoPalette&quot;:{&quot;Vibrant&quot;:{&quot;rgb&quot;:[251,191,16],&quot;population&quot;:370},&quot;DarkVibrant&quot;:{&quot;rgb&quot;:[120,52,12],&quot;population&quot;:386},&quot;LightVibrant&quot;:{&quot;rgb&quot;:[251,243,132],&quot;population&quot;:165},&quot;Muted&quot;:{&quot;rgb&quot;:[103,126,146],&quot;population&quot;:312},&quot;DarkMuted&quot;:{&quot;rgb&quot;:[63,77,93],&quot;population&quot;:326},&quot;LightMuted&quot;:{&quot;rgb&quot;:[156,180,190],&quot;population&quot;:302}}},&quot;post&quot;:{&quot;id&quot;:175157766,&quot;publication_id&quot;:5419410,&quot;title&quot;:&quot;If your random seed is 42 I will come to your office and set your computer on fire&#128293;&quot;,&quot;social_title&quot;:&quot;If your random seed is 42 I will come to your office and set your computer on fire&#128293;&quot;,&quot;search_engine_title&quot;:null,&quot;search_engine_description&quot;:null,&quot;type&quot;:&quot;newsletter&quot;,&quot;slug&quot;:&quot;if-your-random-seed-is-42-i-will&quot;,&quot;post_date&quot;:&quot;2025-10-22T12:29:26.455Z&quot;,&quot;audience&quot;:&quot;everyone&quot;,&quot;podcast_duration&quot;:null,&quot;video_upload_id&quot;:null,&quot;write_comment_permissions&quot;:&quot;everyone&quot;,&quot;should_send_free_preview&quot;:false,&quot;free_unlock_required&quot;:false,&quot;default_comment_sort&quot;:null,&quot;canonical_url&quot;:&quot;https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will&quot;,&quot;section_id&quot;:null,&quot;podcast_art_url&quot;:null,&quot;is_published&quot;:true,&quot;live_stream_id&quot;:null,&quot;restacks&quot;:4,&quot;top_exclusions&quot;:[],&quot;pins&quot;:[],&quot;is_section_pinned&quot;:false,&quot;has_shareable_clips&quot;:false,&quot;section_slug&quot;:null,&quot;section_name&quot;:null,&quot;reactions&quot;:{&quot;&#10084;&quot;:15},&quot;subtitle&quot;:&quot;Figuratively. More likely you'll get a stern talking to.&quot;,&quot;cover_image&quot;:&quot;https://substackcdn.com/image/fetch/$s_!e0-K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg&quot;,&quot;cover_image_is_square&quot;:false,&quot;cover_image_is_explicit&quot;:false,&quot;podcast_url&quot;:&quot;&quot;,&quot;videoUpload&quot;:null,&quot;podcastFields&quot;:{&quot;post_id&quot;:175157766,&quot;podcast_episode_number&quot;:null,&quot;podcast_season_number&quot;:null,&quot;podcast_episode_type&quot;:null,&quot;should_syndicate_to_other_feed&quot;:null,&quot;syndicate_to_section_id&quot;:null,&quot;hide_from_feed&quot;:false,&quot;free_podcast_url&quot;:null,&quot;free_podcast_duration&quot;:null},&quot;podcast_upload_id&quot;:null,&quot;podcast_preview_upload_id&quot;:null,&quot;podcastUpload&quot;:null,&quot;podcastPreviewUpload&quot;:null,&quot;voiceover_upload_id&quot;:null,&quot;voiceoverUpload&quot;:null,&quot;has_voiceover&quot;:false,&quot;description&quot;:&quot;Figuratively. More likely you'll get a stern talking to.&quot;,&quot;body_json&quot;:null,&quot;body_html&quot;:null,&quot;truncated_body_text&quot;:&quot;When you&#8217;re as old as I am, old enough to remember that there was a time before the internet, when you had to go to the library to read a book or drop coins into a metal box to make a phone call, you have absorbed a lot of geek lore. So, when you read some tutorial about machine learning or data analysis and you see&quot;,&quot;wordcount&quot;:2280,&quot;postTags&quot;:[{&quot;id&quot;:&quot;8e7489d4-0069-473b-862d-47f401a1507d&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;Science&quot;,&quot;slug&quot;:&quot;science&quot;,&quot;hidden&quot;:false},{&quot;id&quot;:&quot;ba1c348d-6d19-44a0-9af9-a1a82d6bb46a&quot;,&quot;publication_id&quot;:5419410,&quot;name&quot;:&quot;Technology&quot;,&quot;slug&quot;:&quot;technology&quot;,&quot;hidden&quot;:false}],&quot;teaser_post_eligible&quot;:true,&quot;postCountryBlocks&quot;:[],&quot;headlineTest&quot;:null,&quot;coverImagePalette&quot;:{&quot;Vibrant&quot;:{&quot;rgb&quot;:[236,180,74],&quot;population&quot;:9},&quot;DarkVibrant&quot;:{&quot;rgb&quot;:[120.003,82.875,12.597],&quot;population&quot;:0},&quot;LightVibrant&quot;:{&quot;rgb&quot;:[242.403,205.27499999999998,134.99699999999999],&quot;population&quot;:0},&quot;Muted&quot;:{&quot;rgb&quot;:[132,116,92],&quot;population&quot;:544},&quot;DarkMuted&quot;:{&quot;rgb&quot;:[66,48,37],&quot;population&quot;:348},&quot;LightMuted&quot;:{&quot;rgb&quot;:[193,193,188],&quot;population&quot;:145}},&quot;publishedBylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;handle&quot;:&quot;clauswilke&quot;,&quot;previous_name&quot;:null,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;,&quot;bio&quot;:&quot;Science, Communication, AI&quot;,&quot;profile_set_up_at&quot;:&quot;2021-12-28T18:33:35.718Z&quot;,&quot;reader_installed_at&quot;:&quot;2025-06-28T21:59:11.394Z&quot;,&quot;publicationUsers&quot;:[{&quot;id&quot;:5527897,&quot;user_id&quot;:64064132,&quot;publication_id&quot;:5419410,&quot;role&quot;:&quot;admin&quot;,&quot;public&quot;:true,&quot;is_primary&quot;:false,&quot;publication&quot;:{&quot;id&quot;:5419410,&quot;name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;subdomain&quot;:&quot;clauswilke&quot;,&quot;custom_domain&quot;:&quot;blog.genesmindsmachines.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;hero_text&quot;:&quot;Genes, Minds, Machines: Thoughts about Science, Communication, and AI. A newsletter covering topics in biology, data visualization, effective communication, AI, and higher education.&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;author_id&quot;:64064132,&quot;primary_user_id&quot;:null,&quot;theme_var_background_pop&quot;:&quot;#FF6719&quot;,&quot;created_at&quot;:&quot;2025-06-22T21:27:02.264Z&quot;,&quot;email_from_name&quot;:&quot;Claus Wilke&quot;,&quot;copyright&quot;:&quot;Claus Wilke&quot;,&quot;founding_plan_name&quot;:null,&quot;community_enabled&quot;:true,&quot;invite_only&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;language&quot;:null,&quot;explicit&quot;:false,&quot;homepage_type&quot;:&quot;magaziney&quot;,&quot;is_personal_mode&quot;:false}}],&quot;is_guest&quot;:false,&quot;bestseller_tier&quot;:null,&quot;status&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:5,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;subscriber&quot;,&quot;tier&quot;:5,&quot;accent_colors&quot;:null},&quot;paidPublicationIds&quot;:[1017072,332996,1176440,922948,1875267],&quot;subscriber&quot;:null},&quot;primary_publication&quot;:{&quot;id&quot;:5419410,&quot;subdomain&quot;:&quot;clauswilke&quot;,&quot;custom_domain&quot;:&quot;blog.genesmindsmachines.com&quot;,&quot;custom_domain_optional&quot;:false,&quot;name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;logo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;author_id&quot;:64064132,&quot;user_id&quot;:64064132,&quot;handles_enabled&quot;:false,&quot;explicit&quot;:false,&quot;is_personal_mode&quot;:false,&quot;payments_state&quot;:&quot;disabled&quot;,&quot;pledges_enabled&quot;:false}}],&quot;reaction&quot;:false,&quot;reaction_count&quot;:15,&quot;comment_count&quot;:7,&quot;child_comment_count&quot;:3,&quot;audio_items&quot;:[{&quot;post_id&quot;:175157766,&quot;voice_id&quot;:&quot;en-US-OnyxTurboMultilingualNeural&quot;,&quot;audio_url&quot;:&quot;https://substack-video.s3.amazonaws.com/video_upload/post/175157766/tts/04147939-75ee-4030-b823-eef0e24eebb9/en-US-OnyxTurboMultilingualNeural.mp3&quot;,&quot;type&quot;:&quot;tts&quot;,&quot;status&quot;:&quot;completed&quot;}],&quot;is_geoblocked&quot;:false,&quot;hasCashtag&quot;:false,&quot;inboxItem&quot;:{&quot;content_key&quot;:&quot;post:175157766&quot;,&quot;updated_at&quot;:&quot;2025-10-22T13:07:26.054Z&quot;,&quot;content_date&quot;:&quot;2025-10-22T12:29:26.455Z&quot;,&quot;inbox_date&quot;:&quot;2025-10-22T12:29:26.455Z&quot;,&quot;seen_at&quot;:&quot;2025-10-22T13:07:26.054Z&quot;,&quot;saved_at&quot;:null,&quot;archived_at&quot;:null,&quot;skip_inbox&quot;:false,&quot;type&quot;:&quot;post&quot;,&quot;post_id&quot;:175157766,&quot;extra_views&quot;:[],&quot;read_progress&quot;:0,&quot;max_read_progress&quot;:1,&quot;audio_progress&quot;:0,&quot;max_audio_progress&quot;:0,&quot;video_progress&quot;:0,&quot;max_video_progress&quot;:0,&quot;postType&quot;:&quot;newsletter&quot;,&quot;title&quot;:&quot;If your random seed is 42 I will come to your office and set your computer on fire&#128293;&quot;,&quot;subtitle&quot;:&quot;Figuratively. More likely you'll get a stern talking to.&quot;,&quot;detail_view_subtitle&quot;:&quot;Figuratively. More likely you'll get a stern talking to.&quot;,&quot;cover_photo_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!e0-K!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F23bebca8-b5bb-43ce-afef-078bcb795395_1200x993.jpeg&quot;,&quot;audience&quot;:&quot;everyone&quot;,&quot;is_preview&quot;:false,&quot;audio_url&quot;:&quot;https://substack-video.s3.amazonaws.com/video_upload/post/175157766/tts/04147939-75ee-4030-b823-eef0e24eebb9/en-US-OnyxTurboMultilingualNeural.mp3&quot;,&quot;audio_type&quot;:&quot;tts&quot;,&quot;web_url&quot;:&quot;https://blog.genesmindsmachines.com/p/if-your-random-seed-is-42-i-will&quot;,&quot;duration_metadata&quot;:{&quot;word_count&quot;:2280},&quot;authors&quot;:[&quot;Claus Wilke&quot;],&quot;published_bylines&quot;:[{&quot;id&quot;:64064132,&quot;name&quot;:&quot;Claus Wilke&quot;,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f86ed0b8-faec-478f-9afa-6a59f2c148fc_2000x2000.png&quot;}],&quot;coverImagePalette&quot;:{&quot;Vibrant&quot;:{&quot;rgb&quot;:[236,180,74],&quot;population&quot;:9},&quot;DarkVibrant&quot;:{&quot;rgb&quot;:[120.003,82.875,12.597],&quot;population&quot;:0},&quot;LightVibrant&quot;:{&quot;rgb&quot;:[242.403,205.27499999999998,134.99699999999999],&quot;population&quot;:0},&quot;Muted&quot;:{&quot;rgb&quot;:[132,116,92],&quot;population&quot;:544},&quot;DarkMuted&quot;:{&quot;rgb&quot;:[66,48,37],&quot;population&quot;:348},&quot;LightMuted&quot;:{&quot;rgb&quot;:[193,193,188],&quot;population&quot;:145}},&quot;publication_id&quot;:5419410,&quot;publisher_image_url&quot;:&quot;https://substackcdn.com/image/fetch/$s_!3tvK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b85fecd-da20-4614-b9b3-54f277cfa6bd_982x982.png&quot;,&quot;publisher_name&quot;:&quot;Genes, Minds, Machines&quot;,&quot;is_personal_mode&quot;:false,&quot;like_count&quot;:15,&quot;comment_count&quot;:7,&quot;tracking_parameters&quot;:{&quot;is_saved&quot;:false,&quot;is_seen&quot;:true,&quot;post_id&quot;:175157766,&quot;post_type&quot;:&quot;newsletter&quot;,&quot;publication_id&quot;:5419410,&quot;tabId&quot;:&quot;home&quot;,&quot;tabType&quot;:&quot;base&quot;,&quot;max_read_progress&quot;:1,&quot;max_audio_progress&quot;:0,&quot;max_video_progress&quot;:0,&quot;last_seen_at&quot;:&quot;2025-10-22T13:07:26.054Z&quot;,&quot;impression_id&quot;:&quot;5e94ed9a-83e4-4a85-bd31-03450313fa32&quot;}},&quot;is_saved&quot;:false,&quot;saved_at&quot;:null,&quot;is_viewed&quot;:true,&quot;read_progress&quot;:0,&quot;max_read_progress&quot;:1,&quot;audio_progress&quot;:0,&quot;max_audio_progress&quot;:0,&quot;video_progress&quot;:0,&quot;max_video_progress&quot;:0,&quot;restacked&quot;:false},&quot;postSelection&quot;:{&quot;id&quot;:&quot;29632fac-e8e9-453e-a2ca-9ca645d5ba59&quot;,&quot;created_at&quot;:&quot;2025-10-22T13:32:20.038Z&quot;,&quot;post_id&quot;:175157766,&quot;start_paragraph&quot;:25,&quot;end_paragraph&quot;:25,&quot;start_offset&quot;:0,&quot;end_offset&quot;:109,&quot;text&quot;:&quot;Why choose a specific random seed at all? Is this actually a good idea? In general, I think the answer is no.&quot;,&quot;is_auto_selection&quot;:false},&quot;postSelectionTheme&quot;:{&quot;name&quot;:&quot;DarkMuted&quot;,&quot;alignment&quot;:&quot;left&quot;},&quot;postImageSelection&quot;:null,&quot;clipInfo&quot;:null,&quot;mediaClip&quot;:null}],&quot;name&quot;:&quot;Nick Bailey&quot;,&quot;user_id&quot;:325256094,&quot;photo_url&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3fbaba86-da9b-4436-80c7-dd2df6eb116f_2316x2316.jpeg&quot;,&quot;user_bestseller_tier&quot;:null,&quot;userStatus&quot;:{&quot;bestsellerTier&quot;:null,&quot;subscriberTier&quot;:1,&quot;leaderboard&quot;:null,&quot;vip&quot;:false,&quot;badge&quot;:{&quot;type&quot;:&quot;subscriber&quot;,&quot;tier&quot;:1,&quot;accent_colors&quot;:null},&quot;paidPublicationIds&quot;:[250377],&quot;subscriber&quot;:null}}}" data-component-name="CommentPlaceholder"></div><p>A somewhat similar idea: Use the current date as random seed. This at a minimum gives you a different seed each day, while also avoiding the problem of appearing to have fished for the seed that gives you desired results. The devil is in the details though of how exactly you convert the date into a seed. See this post by Stephen Turner and the subsequent replies:</p><div class="bluesky-wrap outer" style="height: auto; display: flex; margin-bottom: 24px;" data-attrs="{&quot;postId&quot;:&quot;3m3rxz2ea3k2u&quot;,&quot;authorDid&quot;:&quot;did:plc:ppvxhapnptcy5v6cih3ynmzg&quot;,&quot;authorName&quot;:&quot;Stephen Turner&quot;,&quot;authorHandle&quot;:&quot;stephenturner.us&quot;,&quot;authorAvatarUrl&quot;:&quot;https://cdn.bsky.app/img/avatar/plain/did:plc:ppvxhapnptcy5v6cih3ynmzg/bafkreif6sokzuisvfmv6hd3rzfhraijpk3o7236wiuydhz7bfaxvac62wm@jpeg&quot;,&quot;text&quot;:&quot;I forget where I first saw this trick, but this is valid #Rstats code:\n\nset.seed(2025-10-22)\n\nSetting the random seed to today's ISO 8601 avoids using the same seed, and gives you a quick reference for the day you started the project without digging through git logs.&quot;,&quot;createdAt&quot;:&quot;2025-10-22T13:42:49.086Z&quot;,&quot;uri&quot;:&quot;at://did:plc:ppvxhapnptcy5v6cih3ynmzg/app.bsky.feed.post/3m3rxz2ea3k2u&quot;,&quot;imageUrls&quot;:[]}" data-component-name="BlueskyCreateBlueskyEmbed"><iframe id="bluesky-3m3rxz2ea3k2u" data-bluesky-id="36457979562939213" src="https://embed.bsky.app/embed/did:plc:ppvxhapnptcy5v6cih3ynmzg/app.bsky.feed.post/3m3rxz2ea3k2u?id=36457979562939213" width="100%" style="display: block; flex-grow: 1;" frameborder="0" scrolling="no"></iframe></div><p>Thanks to <a href="https://bsky.app/profile/csgillespie.bsky.social/post/3m3sgvsfzb22t">Colin Gillespie over on BlueSky,</a> I have learned how to do code searches on GitHub. So now I can report that, as of this writing, there are <a href="https://github.com/search?q=%22random_state%3D42%22+language%3Apython&amp;type=code">496k cases of </a><code>random_state=42</code><a href="https://github.com/search?q=%22random_state%3D42%22+language%3Apython&amp;type=code"> on GitHub.</a> </p><p>Finally, an issue to be aware of if you&#8217;re using Matlab: It uses a fixed random seed every time, so the generated random numbers are always the same in a fresh Matlab session:</p><div class="bluesky-wrap outer" style="height: auto; display: flex; margin-bottom: 24px;" data-attrs="{&quot;postId&quot;:&quot;3m3rvlh3f3k2s&quot;,&quot;authorDid&quot;:&quot;did:plc:g2tat6psvgnnu7gpogyqktwf&quot;,&quot;authorName&quot;:&quot;Charlotte Reese Marshall used to be Tom Rhys Marshall&quot;,&quot;authorHandle&quot;:&quot;tomrhysmarshall.bsky.social&quot;,&quot;authorAvatarUrl&quot;:&quot;https://cdn.bsky.app/img/avatar/plain/did:plc:g2tat6psvgnnu7gpogyqktwf/bafkreie32qzya4hebfioztdxcubd7uyvlxowu74uy3m5mws5slymmlfpiu@jpeg&quot;,&quot;text&quot;:&quot;Even worse, don't rely on default-on-initialisation values for your random seed &#128561;&#128561;&#128561;\n\nI did some drama about this on the birdsite once: blogs.mathworks.com/matlab/2022/...&quot;,&quot;createdAt&quot;:&quot;2025-10-22T12:59:25.181Z&quot;,&quot;uri&quot;:&quot;at://did:plc:g2tat6psvgnnu7gpogyqktwf/app.bsky.feed.post/3m3rvlh3f3k2s&quot;,&quot;imageUrls&quot;:[]}" data-component-name="BlueskyCreateBlueskyEmbed"><iframe id="bluesky-3m3rvlh3f3k2s" data-bluesky-id="22457967640076548" src="https://embed.bsky.app/embed/did:plc:g2tat6psvgnnu7gpogyqktwf/app.bsky.feed.post/3m3rvlh3f3k2s?id=22457967640076548" width="100%" style="display: block; flex-grow: 1;" frameborder="0" scrolling="no"></iframe></div><p>You can read more about this <a href="https://blogs.mathworks.com/matlab/2022/06/07/6-3-7-8-5-1-2-4-9-10-or-a-story-of-surprise-about-randomness/">on the Matlab blog.</a></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I am channeling <a href="https://jennybryan.org/">Jenny Bryan</a> here, who made similar statements about some <a href="https://tidyverse.org/blog/2017/12/workflow-vs-script/">widely popular, bad practices in R coding.</a> </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>NPC = Non-player character. NPCs are all the elements of a game that do something autonomously, not directed by a human playing the game. For example, the monsters are usually NPCs.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>I.e., any integer between 10,000,000 and 99,999,999. There 90 million possible choices in this range of numbers. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Most programming languages limit you to 32-bit integers as seed values. That&#8217;s equivalent to 4.3 billion different options, not even enough to give every living person on the planet their own sequence of random numbers.  </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>For most programming languages, if you don&#8217;t specify a random seed, the language uses true randomness to set the initial state. The random initial state will be derived from a hardware random number generator (if available) or from the current time otherwise. This initial state will be different every time you start up your programming environment.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Do I have to say it? Don&#8217;t pick 9427385. There&#8217;s nothing special about it. I just pressed some number keys and this is what came out.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>I have always preferred Adams&#8217; novels <em>Dirk Gently&#8217;s Holistic Detective Agency</em> and <em>The Long Dark Tea-Time of the Soul</em> over the <em>Hitchhiker&#8217;s Guide</em> series. But the <em>Hitchhiker&#8217;s Guide</em> series is good, in particular the first two books. The strange humor is present in all of Adams&#8217; writing, though.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>This then leads to them building another, even bigger computer, which turns out to be all of Earth, and that&#8217;s an important component of the storyline in the book. But that&#8217;s not relevant to our discussion here, which is about random seeds.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>Or not. Again, Douglas Adams&#8217; humor was a bit weird, and always trending towards rather silly.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Most graduate students propose to do too much]]></title><description><![CDATA[No thesis proposal has ever been critizied for lack of ambition]]></description><link>https://blog.genesmindsmachines.com/p/most-graduate-students-propose-to</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/most-graduate-students-propose-to</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 16 Oct 2025 16:43:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7uZJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><em>Every time I post anything about PhD education, somebody stops by and claims all that professors care about is squeezing as much work as humanly possible out of PhD students. And also, of course, that professors want to keep their students around for as long as possible, definitely much longer than the customary five years, again to maximize cheap labor. While such professors do exist, I don&#8217;t think they are representative. And many graduate programs keep a watchful eye on time to graduation and make a concerted effort to get students out on time.</em></p><p><em>To provide an alternative perspective, here I&#8217;m re-publishing a lightly edited version of <a href="https://clauswilke.com/blog/2013/12/07/excess-ambitionthe-eternal-flaw-of-all-phd-thesis-proposals/">a blog post from 2013,</a> about how most graduate students propose to do too much and need to be reined in in their ambition. The post is primarily about the PhD thesis proposal, where students need to present a plan for their PhD work to a panel of professors. However, much of its content applies more broadly. Even after having successfully defended a thesis proposal many graduate students want to do too much.</em></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7uZJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7uZJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7uZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg" width="1456" height="2063" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:2063,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1674123,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/172791405?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7uZJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 424w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 848w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!7uZJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8aa73567-8e38-4b70-b029-4496a6643d93_2768x3922.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@armand_khoury?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Armand Khoury</a> on <a href="https://unsplash.com/photos/boy-on-ladder-under-blue-sky-Ba6IlmAzl-k?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>I cannot remember ever having seen a graduate student present a PhD thesis proposal and be criticized for lack of ambition. It never happens. Even the weakest students&#8212;especially the weakest students&#8212;present proposals that are overly ambitious and that won&#8217;t ever get done, and certainly not in the 3&#8211;4 years remaining until graduation. In fact, in my experience it is exceedingly rare that a student presents a reasonable proposal, one that is actually doable during the remainder of their time in graduate school. Usually, those only happen when students &#8220;forget&#8221; to have their qualifying exams and end up presenting their &#8220;proposal&#8221; six months before the intended graduation date. In those cases, the students know that they won&#8217;t accomplish much new between proposal day and defense day, and therefore they present a proposal that consists entirely of completed work.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/most-graduate-students-propose-to?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/most-graduate-students-propose-to?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/most-graduate-students-propose-to?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>In biology PhD programs in the US, most professors expect graduate students to complete about three projects, corresponding to the magical three specific aims in a typical grant proposal.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> It follows that a graduate student who is defending their proposal, 2&#8211;3 years into their program, should have one project completed, one well under way, and one in the early planning stages. Students doing complicated experimental work might have progressed less, but at a minimum they should have one project well under way when they defend their thesis proposal. This leads to a pretty good rule of thumb for the amount of work the proposal should encompass: Aim 1 should be the work that is in the bag, and Aims 2 and 3 together should not require more than twice the amount of work already accomplished.</p><p>I rarely see PhD proposals that meet this rule of thumb. Instead, the already completed work is frequently only a small component of the proposed Aim 1, which by itself is going to take another two years to complete. Proposed Aim 2 will need four years on top of that, and Aim 3 another ten. Many graduate students propose to carry out a lifetime of research during their graduate studies.</p><p>I don&#8217;t quite know why PhD proposals tend to be overly ambitious. Maybe it&#8217;s youthful optimism or naivet&#233;. I suspect, though, that there is a component of worry, the eternal graduate student concern of not being sufficiently productive, of not doing enough. Ironically, this concern often causes students to overlook the successes that are within reach and instead try to reach for the stars. In general, doing a successful thesis is a fine balancing act between being overly ambitious and playing it too safe, a topic for another post. However, there&#8217;s a difference between an actual PhD thesis and a thesis proposal: The ideal thesis will contain some exciting, risky work, but for the proposal most professors want to see a plan that is doable, not one that might be doable if the stars align correctly. As a smart graduate student, you have two alternative plans, one safe and one daring, you work on both of them at the same time, and you only present the safe one during the committee meeting.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><p>A second, related issue I frequently notice is that students display poor judgment in how much work they can realistically accomplish in their remaining time. Estimates are consistently too optimistic. If you are in year three, and you have completed 50% of your first project, it is unlikely that you&#8217;ll complete this and two entirely different projects in the remaining 2&#8211;3 years of your PhD. Further, unless you&#8217;re a paper-writing machine, it&#8217;s unlikely that you can write a paper in less than three months.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> So, if you still have three manuscripts to complete, plus a thesis, the writing time alone is going to be about a year. If you&#8217;re already halfway into year three, you&#8217;ll have about another 18 months of actual research work you can do, because the other 12 you&#8217;ll spend writing. (Of course I&#8217;d recommend that you <a href="https://blog.genesmindsmachines.com/p/from-the-archives-when-should-you">don&#8217;t wait all the way till the end before you start writing</a>, but the math comes out the same.) My personal rule of thumb is things take about three times longer than what students estimate. So if a student says a particular project needs another three months in the lab plus a month to be written up, I expect that project to be done around the same time next year.</p><p>In conclusion, when you prepare your thesis proposal, realistically assess how much work you can complete during the remainder of your graduate years. Don&#8217;t assume that your productivity will double or triple over the next two years, because it won&#8217;t. Budget at least three months for every paper you have to write, and triple the time you think it takes to complete the remaining lab work. If you have papers in review, consider that responding to reviewer comments and revising a paper frequently takes another two to three months, during which nothing else gets done. If you end up with a plan that will require another five years of work or more, then you&#8217;ll have to change your aims. See whether your current Aim 1 can be broken down into reasonable sub-aims which can be considered the separate chapters of your thesis. It&#8217;s quite common for me to conclude a PhD proposal defense by telling the student it&#8217;d be best to scrap Aims 2 and 3 altogether and instead expand Aim 1 into the entire thesis. If you come to this realization before the committee meeting, I won&#8217;t have to tell you so during, and everybody is happier.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>While proposals with either two or four aims can also be viable, two can appear as unimaginative (he really couldn&#8217;t think of anything else?) and four is getting dangerously close to being overly ambitious, so three it is.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>And you abandon the daring plan the moment you realize it won&#8217;t be possible to bring it to completion.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Also, if you&#8217;re a paper-writing machine, why haven&#8217;t you written a bunch of papers already by the time you&#8217;re defending your proposal?</p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[We still can’t predict much of anything in biology]]></title><description><![CDATA[Biology is hard. Yes, even for AI.]]></description><link>https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/we-still-cant-predict-much-of-anything</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Tue, 07 Oct 2025 12:27:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!02U1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>AI has gotten amazingly good for programming. Claude Sonnet will zero- or one-shot small programming tasks without mistakes. And while I don&#8217;t think AI is ready to replace software engineers outright, or that vibe coding a fully featured app is a good idea, for simple tasks AI is outstanding. For example, I can perform basic data analysis, maybe visualize a dataset with a PCA or run a classifier, by sketching out what I want in a prompt and Claude will reliably write code that can do the task.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!02U1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!02U1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 424w, https://substackcdn.com/image/fetch/$s_!02U1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 848w, https://substackcdn.com/image/fetch/$s_!02U1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!02U1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!02U1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2117390,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175321052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!02U1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 424w, https://substackcdn.com/image/fetch/$s_!02U1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 848w, https://substackcdn.com/image/fetch/$s_!02U1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!02U1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F976b3f4b-b2b5-4389-8634-fb2d0227207b_5168x3448.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@enginakyurt?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">engin akyurt</a> on <a href="https://unsplash.com/photos/red-and-yellow-hand-tool-bPiuY2ZSlvU?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>It&#8217;s very tempting, in particular to the tech crowd, to look at this AI success in programming and extrapolate to other application areas. One popular area of extrapolation is biology. If we can teach an AI programming by feeding it millions of examples of code snippets, so the thinking goes, surely we can also teach it biology by feeding it millions of examples of biological data. And yes, to some extent this works. AlphaFold is pretty good.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> But, in practice, what happens more often than not, is not what I would call the AlphaFold experience.</p><p>This is my usual experience: I read about some new computational method that appears to work exceptionally well, I get excited because it&#8217;s exactly what I need for one of my projects, I try the method, and results are disappointing. Most of the time things don&#8217;t work, or at least not as well as expected. I have seen this play out so many times my default assumption is nothing is going to work. And anything that does actually work is a bit of a miracle. The only successful strategy is volume. If you&#8217;re trying sufficiently many things, some do in fact work, and those you can publish.</p><p>My latest example is with the software <a href="https://www.nature.com/articles/s41586-025-09429-6">BindCraft,</a> a tool built on top of AlphaFold to design peptide binders. Before I continue, let me emphasize that in my opinion BindCraft is really good. I have the utmost respect for its creators. It&#8217;s well documented, it&#8217;s easy to install, it does what it claims to do. Without doubt, it&#8217;s the best tool for designing peptide binders currently on the market. And yet, even though the authors write that they achieve &#8220;de novo protein binder design with experimental success rates of 10&#8211;100%,&#8221; in our own hands only maybe one out of about a hundred designs actually works. We can design binders with BindCraft, and we can design them more efficiently than using any other available method, but we still have to experimentally test hundreds of designs to get a handful of working inhibitors.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><p>This story is not unique to AI methods. I&#8217;ve been in computational biology for nearly three decades. Nothing has fundamentally changed during this entire time between how papers describe the success of new computational methods and how the methods actually perform in practice when you use them on your own system of interest. I remember, in the early 2000, David Baker was revolutionizing computational protein design with his Rosetta software suite, winning CASP competitions left and right, and writing papers that gave the impression computational protein design was solved. For example, computational design of novel folds <a href="https://www.science.org/doi/abs/10.1126/science.1089427">was solved by 2003,</a> protein docking <a href="https://www.sciencedirect.com/science/article/abs/pii/S0022283603006703">was solved by 2003,</a> enzyme design <a href="https://www.nature.com/articles/nature06879">was solved by 2008,</a> and atom-level co-folding of multiple peptide chains <a href="https://www.pnas.org/doi/abs/10.1073/pnas.0904407106">was solved by 2009.</a> And yet, here we are, twenty years later, all of these topics are still active areas of research, and if you have any particular system of interest you may find that none of the available methods perform that well.</p><p>Now let me be absolutely clear: I&#8217;m not accusing anybody of faking data, doing sloppy science, or misrepresenting their results. All the papers I have cited here are examples of outstanding science practiced at the highest level. Instead, I think there are several things going on. First, biology is really difficult. Just because something works in one system doesn&#8217;t mean it&#8217;ll work in another. Second, there is publication bias. The examples that work get published, the ones that don&#8217;t work do not.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> Third, experienced PIs develop an instinctual understanding of which specific problems are amenable to their methodology. They may subconsciously or deliberately choose problems where the likelihood of success is high. I&#8217;m sure if you went to David Baker with a protein-design problem he could tell you right away whether you&#8217;re likely going to succeed or not and which software and parameter settings would maximize your chances. But, you&#8217;re not David Baker, you don&#8217;t have three decades of experience designing proteins, and so you&#8217;ll naively pick the wrong method or attempt a problem that is simply too hard. And the experience will be frustrating and it&#8217;ll feel like the available methods don&#8217;t perform anywhere near as well as the published papers suggest.</p><p>Going back to our experience with BindCraft, we&#8217;re starting to see patterns emerge of when it succeeds and when it fails. BindCraft appears to be much better at targeting certain binding pockets than others. You wouldn&#8217;t easily find this out from reading the paper, but generate a few thousand designs and you&#8217;re starting to get a feel for how the algorithm behaves. Similarly, there are non-obvious aspects to how you prepare your target structure that matter. In general, you have to make hundreds of seemingly inconsequential choices that can increase or decrease your chances of obtaining a successful binder design. Make all the right choices&#8212;and sprinkle in a bit of luck&#8212;and your design will work. Make a couple of wrong choices and it&#8217;ll fail.</p><p>I want to emphasize that this post is not meant as a complaint about BindCraft. What I describe is a general phenomenon. For another example, look at my recent post about <a href="https://blog.genesmindsmachines.com/p/limitations-of-protein-language-models">how protein language models fail on viral data.</a><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> Specifically, look at the graph I included in that post. Even for the non-viral proteins, for at least half the datasets predictions mostly or entirely fail. An <em>R</em><sup>2</sup> below 0.5 is not a good prediction.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!GtvC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!GtvC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 424w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 848w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!GtvC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png" width="1456" height="1092" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1092,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:103381,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175321052?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!GtvC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 424w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 848w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 1272w, https://substackcdn.com/image/fetch/$s_!GtvC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F762ed6eb-13cf-4ac3-a1fa-fa3986a7ea75_1800x1350.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Transfer learning performance varies widely among different datasets, and it performs particularly poorly on viral proteins. Figure from <a href="https://www.pnas.org/doi/10.1073/pnas.2513608122">Wilke and Vieira (2025),</a> representing results reproduced from <a href="https://doi.org/10.1038/s41598-025-05674-x">Vieira et al. (2025).</a></figcaption></figure></div><p>Why are predictions so poor? How can biology be so difficult? Biology is just physics and chemistry, and in principle we should be able to write down the equations of motion of any biological system and solve them numerically. In practice, however, any realistic biological system is way too large for this approach and we don&#8217;t have the required compute. It&#8217;s not feasible to rent an entire supercomputer for a year just to calculate the fitness effect of a single mutation. So we have to rely on shortcuts. The shortcuts that work are database lookups and interpolation. AlphaFold is a gigantic lookup and interpolation machine. It uses known structures in concert with covariation derived from multiple sequence alignments to link sequences to likely structures.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> Ask it to predict structures of novel folds or of sequences with no known homologs and its performance often craters. <a href="https://blog.genesmindsmachines.com/p/no-alphafold-has-not-completely-solved">I&#8217;ve written previously about the limitations of AlphaFold.</a> And a software such as BindCraft, built on AlphaFold, inherits all of AlphaFold&#8217;s limitations. To improve binder design, we first need to improve protein folding predictions.</p><p>And up to this point we&#8217;ve mostly talked about structure prediction. Move up one level to protein function and things get exponentially more difficult. Any given protein can have hundreds of different functions, when you consider all the different environments in which the protein can be expressed, different interaction partners it may come in contact with, and&#8212;for enzymes&#8212;different substrates it may act upon. If you wanted to construct a model that could reliably predict effects of mutations, you&#8217;d need it to achieve this task for every possible mutation in every single context in which the protein could occur. I hope you can see how big of a task this is, and, for AI approaches, how much data you&#8217;d need to collect to cover this enormous space evenly in your training set.</p><p>And once you&#8217;ve mastered protein function, you&#8217;re still only at the base of a gigantic mountain. First, there are molecules beyond proteins that also play important roles, such as RNA, DNA, lipids, glycans, various other small molecules, the list goes on and on. And second, you have to assemble individual molecules and their functions into pathways, and then into cells, and organ systems, and organisms, and from there on into populations and eventually ecosystems. You will encounter new challenges and complications at every new level of organization. You may hope that you can abstract away the lower levels as you move up in the hierarchy, but this will only get you so far. Individual genetic changes can have visible consequences at much higher levels of organization. As but one example, consider <a href="https://www.nature.com/articles/nature11816">genetic variation linked to burrowing behavior in mice.</a> In the end, all of biology is steered by what happens at the molecular level, and you can never quite get rid of the complexity and the unexpected effects or interactions.</p><p>I want to close on a more positive note. Even though computational predictions in biology are still fraught with failure, with every passing year science grinds forwards and we&#8217;re collectively accumulating knowledge. Our ability to manipulate biological systems is improving one step at a time. For example, we are much better today at designing peptide binders than we were twenty years ago. And at the same time, we still have many decades of work ahead of us. The world of biology is so large that complete mastery remains out of reach. Today we can design a car or an airplane entirely <em>in silico,</em> and when we build it it works on first try. Maybe, in a few decades, we&#8217;ll be able to do something comparable with a biological system.</p><p>Now I&#8217;d like to hear from you in the comments. Do you have similar experiences? Do your modeling or prediction efforts frequently fail? Or do you think I&#8217;m too negative and in your hands things generally work? In your mind, what&#8217;s the state of computational prediction in biology in 2025?</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><em>This post, or at least something like it, was requested by <a href="https://substack.com/@eurydicelives">Eurydice.</a></em></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>In case you&#8217;re living under a rock and have no idea what I&#8217;m talking about, you can check out <a href="https://github.com/clauswilke/Claude-zero-shot/blob/main/Claude-zero-shot.ipynb">an example I have prepared here.</a> In response to a simple prompt, Claude prepared a non-trivial piece of code that does exactly what I wanted it to do.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Just don&#8217;t go around saying AlphaFold has solved the protein-folding problem. It has not done this. There are many caveats and limitations. <a href="https://blog.genesmindsmachines.com/p/no-alphafold-has-not-completely-solved">I have written about this previously.</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>We&#8217;re designing peptide binders as inhibitors of enzyme activity, so our actual problem is a bit more difficult than just binding, but that&#8217;s what real biology is like. You never actually want the exact same thing that was demonstrated in a paper. Your real-world use case is always a bit different, and typically more complicated. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>There is little we can do about publication bias. Yes, negative results should be published, but you can&#8217;t publish every failed experiment. Most experiments fail for reasons that don&#8217;t warrant publication. Negative results are publishable when you are certain you have made a serious effort at producing the effect and it&#8217;s just not there. Negative results due to negligence, incompetence, or insufficient effort should not be published.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>I know, I promised we&#8217;d write a paper on this. The paper is in the works. As always, things are more complicated than originally envisioned and take longer than expected.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>And David Baker&#8217;s Rosetta software, which is not built on AI, is also fundamentally a lookup machine that stitches together motifs from known protein structures. All successful protein-folding methods rely on structure lookup and covariation analysis.</p></div></div>]]></content:encoded></item><item><title><![CDATA[How to write an NSF GRFP research plan]]></title><description><![CDATA[This is my second post about the NSF GRFP.]]></description><link>https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Thu, 02 Oct 2025 14:02:33 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Rt9s!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This is my second post about the NSF GRFP. My previous one was <a href="https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal">about the personal statement.</a> If you plan to apply for an NSF GRFP, you should definitely read that one first, and then read this post about the research plan. If you don&#8217;t plan to apply to the NSF GRFP, there is no need to read either post.</p><p>After the personal statement, the research plan is the next-most important component of a GRFP application. It is where you describe what research you plan to carry out. And you only have two pages to do so. This means you can&#8217;t actually write a detailed research plan. What you have to write is an outline of a plan. I think of the research plan as an extended summary page. If you&#8217;re familiar with the standard NIH proposal structure,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> think of the Specific Aims page, and just expand it a little bit. There you go. You&#8217;ve got your NSF GRFP research plan. (But don&#8217;t forget the Broader Impacts section.)</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.youtube.com/watch?v=WMoGdIFgy5o" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Rt9s!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 424w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 848w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Rt9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png" width="1456" height="818" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:818,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4715507,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.youtube.com/watch?v=WMoGdIFgy5o&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/175054970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Rt9s!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 424w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 848w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 1272w, https://substackcdn.com/image/fetch/$s_!Rt9s!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F5dfed32e-ef65-4366-b012-31f2629bd9da_3354x1884.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><a href="https://www.youtube.com/watch?v=WMoGdIFgy5o">A video describing the NSF merit review process.</a> The video is not specific to the GRFP but nevertheless provides good insight into how the review and selection process works.</figcaption></figure></div><p>There&#8217;s a lot to be said about how to write a research plan, and I can&#8217;t possibly fit everything into a single post. So I&#8217;ll start by linking to a few other proposal-writing resources, and then I&#8217;ll provide suggestions that are specific to the NSF GRFP. First, I suggest you go and read <a href="https://clauswilke.com/blog/2013/10/28/use-fine-grained-sectioning-in-your-grant-proposals/">this old blog post of mine about fine-grained sectioning in proposals.</a> Then, you may find <a href="https://clauswilke.com/blog/2013/10/17/the-critical-need-in-a-grant-application/">this blog post about the critical need</a> helpful. Finally, <a href="https://wilkelab.org/classes/BIO384C/fall_2015/class08_ProposalWritingCheatSheet.pdf">here is a handout</a> from a class I taught some years ago. The handout has various pointers about writing grant proposals. Most importantly though, on the second page, it has a sentence-by-sentence outline of a proposal summary page. Just follow the outline and your research plan will write itself.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><h2>Write for a broad audience</h2><p>A common mistake I frequently see students make is to assume everybody knows about their research area. Students may have spent years in a lab where everybody talks about the same problems day-in day-out and now they think anybody who is broadly in the same field will be intimately familiar with the specific questions and issues they deal with every day. Nothing could be further from the truth. It&#8217;s best you assume nobody knows anything about your research. Yes, the reviewers are all professors in your area. And no, they have never heard of the system you&#8217;re studying.</p><p>To give a concrete example, in biology, students working on computation frequently misjudge what computational methods the average experimental biologist is familiar with. As a consequence, they write proposals that most reviewers can&#8217;t understand. And the reverse is also true. Experimental students may assume certain approaches are well known when maybe only ten labs in the world do these particular experiments.</p><p>So, do your best to write for a broad audience. But also, be aware you may still misjudge what is and isn&#8217;t broadly known. As a simple test to assess how comprehensible your writing is, show your proposal to a fellow student in your cohort but not in your lab. If they don&#8217;t understand your research plan chances are the reviewers won&#8217;t understand it either.</p><h2>Stay away from biomedical research</h2><p>There&#8217;s an unwritten rule that the NSF doesn&#8217;t like to fund anything that is clearly in NIH territory. The NIH supports research related to human disease. Therefore, the NSF generally stays away from human disease. The reasoning is that the NIH has a budget that is five times bigger than the NSF budget, and yet the NSF is tasked with funding all of science whereas the NIH only funds research into human disease. So that&#8217;s one area the NSF doesn&#8217;t see the need to spend its scarce resources on.</p><p>If you&#8217;re a physicist, a chemist, a computer scientist, or an engineer, there is little risk that you&#8217;ll accidentally propose something that looks like NIH material. But, if you&#8217;re a biologist, this is a realistic concern, since the majority of biological research in the US is funded by the NIH. If you&#8217;re an undergraduate or first-year graduate student doing research in a biological lab in the US there is a good chance that your lab is NIH funded and so you&#8217;ll just instinctively absorb NIH language and priorities. If this is the case for you, I encourage you to deliberately stay away from NIH language and from human disease.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a></p><h2>Don&#8217;t worry about committing to a specific research program</h2><p>Receiving a GRFP award does not obligate you to do the exact research you have proposed. It&#8217;s easy to see why this should be the case: The vast majority of students who are applying are either not yet in graduate school or may be in graduate school but are in a rotation system and haven&#8217;t yet chosen a permanent lab. In both of these cases, students may end up in a situation where they literally cannot do the research they have proposed. For example, if you propose to do research on penguins, and then you don&#8217;t get into the penguin lab or even the school where the penguin lab is and instead end up in a butterfly lab, you&#8217;ll have to do butterfly research. But even if you do end up in the penguin lab, by the time you&#8217;re situated and the GRFP has started almost a year will have passed from when you applied and the specific research you suggested may no longer be relevant, or interesting, or the most impactful way for you to spend your time. Or, you end up having a long conversation with your advisor and they explain to you why maybe you were a bit naive when you wrote your research plan and some parts of it are infeasible, too costly, or uninteresting. In all of these cases, the right thing to do is go after the research that makes the most sense, not the exact research you proposed.</p><p>What you are writing in the research plan is an outline of what a reasonable PhD project could look like for you. You&#8217;re not writing a contract in blood that you&#8217;ll be tied to for the entire three years of GRFP support. Changes in research plans are normal and occur all the time. Changes in advisors are normal and do occur. Even changes in graduate program or university are normal and occur from time to time. You can be awarded an NSF GRFP and still make any of these changes in your PhD research if you want to.</p><h2>Write a research plan about the research you know best</h2><p>It is a good idea to be somewhat strategic about the research you propose, in particular if you haven&#8217;t yet joined a permanent lab. Write about the research you are most familiar with, even if it&#8217;s not necessarily exactly the project you want to do for your PhD.</p><p>As an example, let&#8217;s assume there are two labs you might want to join, one you&#8217;d be very excited about but it&#8217;s a little further thematically from research you have already done, and another one that is more aligned with your prior research experiences but it&#8217;s less exciting to you. In terms of the proposal writing, you will likely be better off writing a proposal related to the second option, even if you&#8217;re confident you&#8217;d join the first lab if both options were available to you. In proposal writing, lack of specificity or detail is the kiss of death. It&#8217;s unlikely you&#8217;ll be able to write a strong proposal about research you haven&#8217;t done previously and have no experience with.</p><h2>Structure your research plan into well defined objectives</h2><p>Your research plan needs an overarching theme or goal or research question, and then two or three well defined objectives you will pursue to achieve the overarching goal. The right number of objectives is critical. Three is ideal, and two is fine. More than three is too ambitious, and less than two<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> means you haven&#8217;t actually developed a proper plan.</p><p>Each objective should have a descriptive title in bold and then about a paragraph of text describing what you will do. It can also help to add a hypothesis after the title and before the paragraph of text. To get an idea of what I mean, take a look at my <a href="https://clauswilke.com/blog/2013/10/28/use-fine-grained-sectioning-in-your-grant-proposals/">old blog post on fine-grained sectioning in research proposals.</a> You can also look at the first page <a href="https://wilkelab.org/classes/BIO384C/fall_2015/class08_NIH_R01_example.pdf">of this old grant proposal</a> I&#8217;ve sometimes used as an example in classes I&#8217;ve taught. But, note that both the blog post and the grant use the term &#8220;Aim.&#8221; Write &#8220;Objective&#8221; instead. Remember, you&#8217;re not applying to the NIH.</p><h2>Address the broader impacts and the intellectual merit</h2><p>In your research plan, you need a separate section called &#8220;Broader Impacts.&#8221; Realistically though, you don&#8217;t have sufficient space in your research plan to write much about broader impacts. My recommendation is to limit this section to a single paragraph, in which you summarize the broader impacts plan that you have spelled out in more detail in your personal statement. See <a href="https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal">my post on the personal statement</a> for details.</p><p>And you also need a section on &#8220;Intellectual Merit.&#8221; My post on the personal statement addresses this also, so I&#8217;m not going to repeat myself here. Go read the prior post if you haven&#8217;t done so yet.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I realize there is probably no overlap between people familiar with the standard NIH proposal structure and applicants to the NSF GRFP. But, if you&#8217;re an applicant, maybe your adviser is familiar with how NIH proposals are structured. If they are, ask them about the Specific Aims page. Importantly, never use the word &#8220;Aim&#8221; in your NSF proposal. NIH funds Aims. NSF funds Objectives. DOD funds Tasks.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>Assuming you have a good idea of what research to do. I can&#8217;t help with that. At least not in this post.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Have I already mentioned that you shouldn&#8217;t use the term &#8220;Specific Aims&#8221;? You&#8217;ll be proposing &#8220;Objectives.&#8221;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>In case it&#8217;s not obvious, less than two means one. Because obviously you can&#8217;t have zero objectives. You need to do something.</p></div></div>]]></content:encoded></item><item><title><![CDATA[How to write an NSF GRFP personal statement]]></title><description><![CDATA[The personal statement can make or break your application.]]></description><link>https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Mon, 29 Sep 2025 12:09:19 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Dsop!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The NSF just released their call for applications for the <a href="https://www.nsf.gov/funding/opportunities/grfp-nsf-graduate-research-fellowship-program">Graduate Research Fellowship Program (GRFP).</a> Applications are due between November 10 and November 14, 2025, depending on the field of study. So, students have a little over five weeks remaining to prepare their applications. This is not a lot of time. To make things a little easier, I&#8217;m providing here a couple of tips and considerations for writing a personal statement. I note that in addition to the personal statement, students also need to submit a research plan. If there is interest, I may write a follow-up post on how to prepare a research plan.</p><p>The intended audience for my post is undergraduates and first-year graduate students in the US applying for an NSF GRFP. If this is not you, there&#8217;s no need to continue reading. However, some parts of the post touch upon general thoughts about personal statements and grant applications, so even if you&#8217;re not in the US or not a student you may find some of the remainder useful.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://www.nsf.gov/funding/opportunities/grfp-nsf-graduate-research-fellowship-program" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dsop!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 424w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 848w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dsop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png" width="1456" height="881" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:881,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2814245,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:&quot;https://www.nsf.gov/funding/opportunities/grfp-nsf-graduate-research-fellowship-program&quot;,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/174739427?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dsop!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 424w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 848w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 1272w, https://substackcdn.com/image/fetch/$s_!Dsop!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F18a4fb32-26f1-48ef-85fd-a2d62aa95eb7_2098x1270.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The instructions for this year&#8217;s application are available <a href="https://www.nsf.gov/funding/opportunities/grfp-nsf-graduate-research-fellowship-program/nsf25-547/solicitation">on the NSF website.</a> If you&#8217;re planning to apply, read these instructions very carefully, from beginning to end. No matter what I say here, if at any point my advice disagrees with what the NSF writes the NSF is correct. It is ultimately your responsibility to adhere to all the requirements spelled out in the application guidelines.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><p>When you read the guidelines, you will see that the two main documents you have to prepare are a personal statement and a research plan. Of those two, the personal statement is the more important one, for two reasons. First, it is substantially longer&#8212;three pages instead of two. That in itself tells you it is more important. There is more space to say something interesting or to make a fool of yourself. Second, I believe<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> that reviewers put more stock into the personal statement. Here is my reasoning: Most students receive more help and feedback for their research plan than for their personal statement. As a consequence, many of the submitted research plans are pretty good, and reviewers can&#8217;t use them to separate the absolute best students from the only very good students. And also, if a research plan is truly excellent, reviewers will worry that the ideas came from the student&#8217;s supervisor rather than from the student themself. So, an excellent research plan with a mediocre personal statement suggests the student got a lot of help but isn&#8217;t that strong when on their own. By contrast, an Ok research plan with a strong personal statement suggests the student may be able to punch above their weight but didn&#8217;t receive as much coaching when preparing the application or may not yet have a lot of research experience. Therefore, all else being equal, I expect most reviewers will weigh the personal statement more than the research plan. Of course, the absolute best students will have both a strong research plan and a strong personal statement, and you should try to achieve this level of excellence.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-personal?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>Now that you know why the personal statement is important, let&#8217;s discuss specific strategies for making it good.</p><h2>Approach the personal statement with the right attitude</h2><p>Let&#8217;s begin with the overall attitude with which to approach your personal statement. Your application will be reviewed by established scientists. What they&#8217;re looking for is a peer, somebody they can treat as a colleague from day one&#8212;not a student who will need a lot of hand-holding.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> So your personal statement needs to convey you&#8217;re this type of student. Think about what a scientist would tell other scientists about themselves. Would they start their personal statement with something like the following?</p><blockquote><p>As a teenager, I was fascinated with the natural world, and in particular with insects. I would observe ants as they were building nests, and I would collect butterflies. My high school science teacher Ms. Smith was a beekeeper and one spring she showed us how she maintained the hive and harvested honey. It made me wonder why the bees always stayed with the hive instead of flying away. And also, when bees leave to forage, how do they manage to find their way back home?</p></blockquote><p>The answer is most likely they would not. They would not because they all have a similar origin story. In high school, they were all weird nerds nobody wanted to hang out with and who could geek out for hours over obscure topics.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> None of your experiences in high school differentiate you from the other applicants with those same experiences, and those experiences are also not relevant for your likely success or failure in graduate school.</p><p>An established scientist would simply lead with their science. They would lay out a grand vision of what they&#8217;re trying to accomplish, and then describe in more detail what they have done so far along this route, what successes they have had, what stumbling blocks they may have encountered, and so on. For example, if I had to write a personal statement for the NSF GRFP, I might write something like the following. (But note that for the NSF GRFP, you probably want to stay away from an emphasis on biomedical research. I emphasized biomedical research here so if you&#8217;re working in an area similar to mine you aren&#8217;t tempted to just copy my text and paste it into your statement.)</p><blockquote><p>I am a computational biologist and data scientist. I use mathematical modeling, computer simulation, and machine learning to address questions of biomedical relevance. Much of my work focuses on disease-causing agents in humans, and my overarching goal is to use computational tools to speed up the discovery of disease mechanisms and design of therapeutic agents. I currently have active research projects developing antibacterial agents, identifying novel molecular or genetic targets for antiviral treatments, and designing inhibitors of gene-editing enzymes.</p></blockquote><p>You may not feel comfortable presenting yourself in this way, as an established scientist who has already accomplished various things and knows what they want to do going forward. You may feel that this is not you, at least not yet. But I can assure you that there will be applicants who have this down, so find your inner scientist and try to bring it to the fore in your own application. Importantly, note that how professional you come across is not determined by how much you brag about yourself (you shouldn&#8217;t) but by your selection of topics. Write about your research. Don&#8217;t write about your experiences in high school science class.</p><h2>Provide your readers with an abstract</h2><p>Write the first paragraph of your personal statement as an abstract. Write six to eight sentences that cover all relevant components: Who you are, what you currently do, what your research is about, what your future goals are. This is a general writing trick when preparing any sort of proposal or application. Even if (or in particular if) the instructions don&#8217;t ask for an abstract, write one. But don&#8217;t label it &#8220;abstract.&#8221; Just turn the first paragraph of your document into a summary of the whole.</p><p>To understand why I&#8217;m giving this advice, put yourself into the mind of your reviewers. They have ten or more applications to review, and they don&#8217;t want it to take all weekend.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> So they would like to figure out quickly which applications to rank at the top and which at the bottom. If you can give them a single paragraph that allows them to envisage your entire application, you&#8217;re already on their good side. The worst thing for a reviewer is when they&#8217;re reading page after page and have no idea what any of this is about. You won&#8217;t believe how many personal statements I&#8217;ve seen where I&#8217;m two pages in, we&#8217;ve finally made it out of high school and into freshman year in college, and I still have no idea where any of this is going and what the applicant wants to tell me.</p><h2>Your statement doesn&#8217;t have to be in chronological order</h2><p>Continuing on from the abstract, you want to lay out your specific argument for who you are and why you deserve a fellowship. This argument does not have to be arranged in chronological order, starting with grade school and ending in the present. While the majority of personal statements are arranged in chronological order, this arrangement often makes them long-winded and tedious to read.</p><p>For example, when you describe your prior research experiences, don&#8217;t arrange them in the order in which they occurred unless there&#8217;s a good reason to do so. Instead, organize the material such that it most supports your overarching story. If that results in a chronological ordering, fine. But if not, also fine.</p><p>Think about the various research experiences you&#8217;ve had, and for every one identify one particular achievement, insight, or observation you want to highlight. Then arrange them such that the sequence of events maximally engages your audience. For example, maybe you first present a success you&#8217;ve had, then you present a failure where you&#8217;ve learned an important lesson, and then you talk about how the lesson you learned will shape your research approach going forward.</p><h2>Separately address intellectual merit and broader impacts</h2><p>You are required to write separate sections for &#8220;intellectual merit&#8221; and &#8220;broader impacts&#8221; in both the personal statement and the research plan. This is stated in bold in <a href="https://www.nsf.gov/funding/opportunities/grfp-nsf-graduate-research-fellowship-program/nsf25-547/solicitation#prep">the application preparation instructions.</a> <strong>If you don&#8217;t have those sections your application will not get reviewed.</strong> As an undergrad or first-year graduate student, this may be the first time you&#8217;re seeing these two terms, and you may have no idea what they mean. They are NSF-specific terms and it&#8217;s best to start with <a href="https://www.nsf.gov/funding/merit-review">the definitions the NSF provides:</a></p><blockquote><p><strong>Intellectual merit: </strong>The potential for the proposed project to advance knowledge and understanding within its own field or across different fields.</p></blockquote><blockquote><p><strong>Broader impacts: </strong>The potential for the proposed project to benefit society and contribute to the achievement of specific, desired societal outcomes.</p></blockquote><p>Intellectual merit is the significance of your project, the <em>why. </em>Why should your project be done, and why should anybody care? Ideally you have some knowledge gap or unsolved scientific problem that your research is going to address, and closing the gap or solving the problem is the intellectual merit of the project.</p><p>Broader impacts on first glance may seem similar. Isn&#8217;t it also the <em>why</em> of your project, only a bit broader? This is a common misreading of what broader impacts are. Experienced NSF investigators know that broader impacts will generally describe a distinct activity that is carried out, separately from the actual research, and that often involves some form of education or outreach (see also my next section). The NSF has a page dedicated to <a href="https://www.nsf.gov/funding/learn/broader-impacts">explaining broader impacts,</a> though even that page has to be taken with a grain of salt. What the NSF says broader impacts are and what reviewers score highly on the broader impacts category is not necessarily perfectly aligned. In case of doubt, ask somebody with extensive experience of receiving NSF funding and sitting on NSF panels to evaluate your broader impacts component.</p><h2>Develop a broader impacts activity</h2><p>A competitive application will typically propose a specific activity that results in broader impacts. It is not sufficient to just talk about how the proposed research is important and may have far-reaching societal consequences. Instead, you need to be actively doing something that is not research and that can be clearly described as broader impacts.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p><p>Here are some examples of what broader impacts activities could be: (1) You&#8217;ll be organizing a club for students with similar research interests to yours. Part of the club&#8217;s goals will be to help interested undergraduates find their way into your research field. (2) You&#8217;ll be reaching out to high schools and visit classrooms and tell students about your research or research in general. (3) You&#8217;ll be giving public lectures. Or, you&#8217;ll be organizing a public lecture series where you connect an interested public with accomplished scientists. (4) You&#8217;ll be developing or maintaining some widely used scientific or educational software, or develop some other resource that will benefit society broadly.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a> In the past, activities that would increase diversity of the student body or help underrepresented minorities were highly valued in the review process, but in 2025 I would stay away from any activities that require you to write these words to describe them.</p><p>Whatever you decide to propose, make sure it is credible and feasible. You increase credibility by providing specific details. For example, simply saying &#8220;I will reach out to local high schools&#8221; is not very credible. If instead you say something like the following the added detail conveys that you&#8217;ve thought this through and know what you&#8217;ll be doing: &#8220;I will work with high schools XXX, YYY, and ZZZ. I have already talked to Mr. Smith who teaches science classes at XXX and Ms. Miller who teaches at YYY. Ms. Miller will also connect me with teachers at ZZZ.&#8221; If you&#8217;ve already done the same type of activity in the past, even better. Include some concrete examples in your statement.</p><p>In principle, you can describe your broader impacts activity either in the research plan or in the personal statement. The NSF instructions don&#8217;t provide guidance on this choice. However, I believe the personal statement is the better option. For one, you have an additional page, and the research plan is going to be tight already. Also, it fits better thematically. Placing the broader impacts activity into the personal statement underlines that it&#8217;s separate from the proposed research.</p><h2>In conclusion</h2><p>Take your personal statement seriously. Chances are it will make or break your application. Ask mentors or friends to read it and give you feedback. Do they find what you wrote compelling? Are they confused by anything? Do they have concerns that something you&#8217;re proposing is not feasible? It is Ok to ask family members who are not academics to give you feedback, but realize they will likely not know what a personal statement should look like and may give bad advice. Their feedback can be helpful in terms of clarity of writing, grammar, sentence structure, and so on, but they&#8217;ll most likely not be able to judge whether you&#8217;re saying the right things in the right order. If you can have an experienced academic review your personal statement that&#8217;s always a better option.</p><p><strong>Update:</strong> Part II of this series is now <a href="https://blog.genesmindsmachines.com/p/how-to-write-an-nsf-grfp-research">available here.</a></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I&#8217;ve had grant proposals rejected without review for overlooking a minor point in the instructions. Grant-proposal guides have to be taken seriously and read cover-to-cover.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p> But of course I can&#8217;t know this for a fact.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>The reviewers will not work with you personally, so why does it matter? It matters because that&#8217;s the thought process they apply. They are looking for the types of students they would want to have in their own labs.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>I&#8217;m writing in the third person here but obviously this includes me. I was that kid.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>This type of work always happens on the weekend, or in the evening, because reviewers are busy people and have regular day jobs during the week.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>And even better if you already have a track record of doing exactly this activity going into the application process. Yes, it&#8217;s a tall order, and many students will struggle to come up with a convincing plan for broader impacts.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>But make sure what you&#8217;re proposing is separate from the intellectual-merit component of your project. There can be thematic overlap, but I would expect to see a distinct activity that I can label as &#8220;broader impacts&#8221; and that is not already included in the proposed research program. </p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Cancel your paid subscriptions right away]]></title><description><![CDATA[I'm not calling for a boycott. By all means, take out paid subscriptions, but also, cancel them. Do it right away, so they don't renew. Do it while the credit card is still warm.]]></description><link>https://blog.genesmindsmachines.com/p/cancel-your-paid-subscriptions-right</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/cancel-your-paid-subscriptions-right</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Fri, 26 Sep 2025 13:59:22 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Nlv0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I am frustrated with the fact that everything these days requires a subscription. No matter which aspect of your life, somebody wants you to commit to monthly recurring payments to provide you with a service that would make your life so much more comfortable.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> Eight-sleep wants you to sign up <a href="https://www.eightsleep.com/blog/understanding-the-eight-sleep-membership/">for a subscription for a bed!</a> If there&#8217;s one thing I don&#8217;t need a subscription for it&#8217;s my bed, thank you very much.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Nlv0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Nlv0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 424w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 848w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Nlv0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png" width="1456" height="302" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:302,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:650723,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/174585365?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Nlv0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 424w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 848w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 1272w, https://substackcdn.com/image/fetch/$s_!Nlv0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F873f905c-32f7-42c0-964a-ad96d5a3a975_2888x600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div></div></div></a></figure></div><p>I realize the irony of writing this on a site that is all built on paid subscriptions. It frustrates me, because I want to support writers and I think Substack gets a lot of things right, such as giving writers control over their distribution lists or providing an easy platform for discovery and promotion.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> But Substack is also full of dark patterns.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> Examples include trying to get you to subscribe before even reading a single article; trying to get you to subscribe to other publications as part of the standard onboarding process; not showing you the details of the paid options unless you click on the subscribe button; replacing articles with a subscribe button if you stop reading but don&#8217;t close the tab; the list goes on and on. One could teach a college course about dark patterns and exclusively use Substack examples for illustration.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/cancel-your-paid-subscriptions-right?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/cancel-your-paid-subscriptions-right?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/cancel-your-paid-subscriptions-right?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>More importantly, however, I think subscription in itself is a dark pattern. Imagine you went to a store to buy bananas, and they said &#8220;sorry, we don&#8217;t do one-time sales; you have to sign up for monthly delivery; how many bananas do you need a month?&#8221; Or you went to a restaurant to have dinner, and they said &#8220;we only serve members; subscribe to the one, two, or four meals a month plan and we&#8217;ll seat you right away.&#8221; But of course these days all of these things exist as subscription services, because vendors have realized that if they can get you to subscribe you will generally buy more from them than you otherwise would have, and probably more than you needed. As a case in point, consider Amazon. If you try to order anything from them that you might need more than once, such as toilet paper or pet food or cooking oil, they try very hard to sell you a subscription service instead. On occasion I&#8217;ve found it quite confusing to figure out how to place a one-time order. I&#8217;m sure people have taken out subscriptions on Amazon just because they were tired of fighting over making a one-time purchase.</p><p>But Amazon at least gives you the option of a one-time sale. Substack does not. You either subscribe, or there&#8217;s nothing here for you to spend your money on. So, if you do like the platform in principle, and if you want to support some of the writers you encounter here, you will end up with an ever growing collection of paid subscriptions.</p><p>To counteract this trend, I recently went through all of my paid subscriptions on Substack and cancelled them one by one.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> As I did so, I discovered one I didn&#8217;t even know I had. This demonstrates the dark pattern at work. Had I not systematically checked, I would have continued paying for who knows how long.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a></p><p>I&#8217;m not convinced that the dark pattern of trying to trick people into forgetting their subscriptions is worth it. I know that for myself, because I&#8217;m so aware of this pattern, I tend to be extremely reluctant to commit to a paid subscription. I&#8217;d be much less worried about making a one-time payment, as if I was buying an author&#8217;s book or attending a live event.</p><p>For this reason, I have decided that going forward, when I want to support an author on Substack, I take out an annual subscription, which I then cancel immediately. This is in effect a one-year non-renewing subscription, and it provides the author with a meaningful level of support. Once the year is up and the subscription ends, if I wish it had continued, I&#8217;ll just take out another one-year subscription.</p><p>After just a few days of trying this out, I have found that this mindset makes it much easier for me to take out a paid subscription for an author I like. I&#8217;m probably willing to do about one yearly subscription per month, given current Substack prices. The feeling of no longer having to worry about all these future recurring payments is rather freeing.</p><p>I realize that as I&#8217;m doing this and saying this I&#8217;m in a way validating Substack&#8217;s dark patterns. By developing strategies to work around their machinations, I&#8217;m removing pressure on Substack to enact any changes.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a> However, I&#8217;m under no illusion that whatever I say could have any influence on what Substack does. I&#8217;m far too unimportant and invisible. So, what I offer here is akin to a lecture on defense against the dark arts, for the handful of people that may read this. If I can help even a few people feel better about paid subscriptions, that seems a worthwhile achievement to me.</p><p>I have seen people argue that Substack should offer one-time payments for specific articles. I understand why Substack doesn&#8217;t want to do this. People claim they would buy articles one-off but realistically the volume would be so low that the vast majority of writers would generate virtually no income from this option. However, I do think Substack should offer yearly non-recurring subscription payments.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a> Give me the option of buying a one-year subscription without having to worry about renewal. Maybe even give me the option of buying multiple years at once.</p><p>But, I don&#8217;t have high hopes that Substack will implement non-recurring subscriptions. And therefore, for now, I encourage everybody to just cancel their (annual) subscriptions right after paying for it. There&#8217;s nothing wrong with doing this. Nothing bad will happen. You will have access to the paid content for a full year.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a> And also, you&#8217;re not a bad person if you cancel the same day you subscribe. You paid for a year. Most people don&#8217;t do this. And you can pay again. If you continue to like an author chances are you will want to pay again. And if in a year&#8217;s time you don&#8217;t even remember why you ever paid a certain author, it&#8217;s probably for the best the subscription is cancelled. So go ahead, cancel your subscriptions today.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>At least, so they claim.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>I&#8217;m talking about Notes here, in case this wasn&#8217;t clear.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>If the term <em>dark pattern</em> doesn&#8217;t mean anything to you, I suggest you <a href="https://en.wikipedia.org/wiki/Dark_pattern">start reading here.</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Here is another great one: The recent unexpected disclosure of everybody&#8217;s paid subscriptions is a case of <a href="https://en.wikipedia.org/wiki/Dark_pattern#Privacy_Zuckering">Privacy Zuckering.</a></p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>As you have to do. Substack doesn&#8217;t make it easy to cancel multiple subscriptions. </p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>Because, let&#8217;s face it, what usually happens is you forget about the subscription until you get the annual renewal notice&#8212;I&#8217;m assuming annual subscriptions here, as I believe most people take out annual subscriptions to get the discount&#8212;and then you go &#8220;I should cancel before it gets renewed next year,&#8221; and then you forget, and the same thing happens again the following year and so on.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>I bet that if Substack noted an increase in immediate cancellations of paid subscription, their first response would not be &#8220;let&#8217;s offer non-renewing subscriptions.&#8221; Instead, more likely than not, they would think about additional dark patterns to discourage people from canceling.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>Substack could leave it to the authors to decide whether to enable this option or not.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>I even got the flowery badge that says &#8220;Claus Wilke <em>subscribes</em>&#8221; even though I cancelled all my subscriptions.</p></div></div>]]></content:encoded></item><item><title><![CDATA[Have you ever seen a man with whiskers?]]></title><description><![CDATA[A 100-year trend can't possibly be caused by technology that is less than 20 years old.]]></description><link>https://blog.genesmindsmachines.com/p/have-you-ever-seen-a-man-with-whiskers</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/have-you-ever-seen-a-man-with-whiskers</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Tue, 23 Sep 2025 12:57:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ubn1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>An article is making the rounds on Substack about <a href="https://jmarriott.substack.com/p/the-dawn-of-the-post-literate-society-aa1">how civilization is in decline and nobody can read or think anymore and it&#8217;s all due to screens and cell phones.</a> The article is long, with numerous charts and figures, and I&#8217;m sure things are exactly as bad as it proclaims. After all, I myself recently said that <a href="https://blog.genesmindsmachines.com/p/how-do-bicycles-work">&#8220;I learned mathematical concepts in high school that STEM PhD students in the US don&#8217;t necessarily know.&#8221;</a> And yet, several aspects of the article bother me. Maybe it&#8217;s just my reflexive contrarian response to posts that declare the sky is falling. The world is typically more complex and nuanced, and this article in particular doesn&#8217;t excel at drawing out those nuances.</p><p>First, just because things are changing doesn&#8217;t mean everything is getting worse or changing for the reasons claimed. The article glosses over such complications, for example by focusing on averages: On average, math scores are down, reading scores are down, books contain shorter sentences.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> But the average can go down even as the variance increases and the top performers get better.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> It&#8217;s possible that we&#8217;re past peak literacy on a broad population basis&#8212;since many people now can obtain entertainment without having to read much&#8212;and yet civilization as whole may not be in decline and reading can continue to be critically important to a meaningful subset of the population. In fact, modern information technologies, including cell phones and the internet, provide nearly limitless access to information, and people take advantage of it in ways that would never have been possible even twenty years ago.</p><p>As a case in point, why are you reading this? Didn&#8217;t you get the memo? You should be on TikTok or Instagram or YouTube (but only in the shorts section, please). If reading is declining, how come a site like Substack, which is so centered around long-form and fairly intellectual content, keeps growing while much easier to consume content such as Fox News or CNN is declining?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9Nxv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9Nxv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9Nxv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg" width="476" height="429.828" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:903,&quot;width&quot;:1000,&quot;resizeWidth&quot;:476,&quot;bytes&quot;:84783,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/174290323?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!9Nxv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 424w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 848w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!9Nxv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2273b305-9908-458c-864d-09b3c0085fa2_1000x903.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">I don&#8217;t know who made this graph. I found it <a href="https://bsky.app/profile/ruoshuiresearch.bsky.social/post/3lyxfqvmkxs2i">on Bluesky.</a></figcaption></figure></div><p>In the same vein, who would have predicted the rise of the 3 hour podcast? It takes serious focus and effort to sit through one of those. Yet the genre is more popular than ever.</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/have-you-ever-seen-a-man-with-whiskers?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/have-you-ever-seen-a-man-with-whiskers?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p><p>The article also relies heavily on associations, presented as if they necessarily reflected causal relationships. And it has issues with the time scales on which different trends can be seen. In particular, sentence complexity in books has been steadily declining for nearly a century, not something we can attribute solely to cell phones or even the internet.</p><p>I could go on about the various statistical flaws and logical fallacies in the article. (Don&#8217;t get me started on the regression of vote share versus TikTok interest in Romania, shown towards the end of the article.) But instead, for the remainder of my post here, I want to focus on one specific comment, regarding a study<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> about the decline of reading comprehension in English literature students in the US. I quote:</p><blockquote><p>A study of English literature students at American universities found that they were unable to understand the first paragraph of Charles Dickens&#8217;s novel <em>Bleak House</em> &#8212; a book that was once regularly read by children.</p></blockquote><p>There are so many issues. First, these were students at two &#8220;regional Kansas Universities.&#8221; So please don&#8217;t imagine Harvard or Stanford undergraduates here. Second, the study was done in 2015. That doesn&#8217;t align well with the author&#8217;s thesis that cell phones and social media are to blame, does it? Third, I feel there&#8217;s a major confounding effect here. By using <em>Bleak House</em>, the study did not just assess students&#8217; reading comprehension but also, implicitly, their knowledge and interest in 19th century English culture and way of life.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a></p><p>In case you&#8217;re not familiar with <em>Bleak House</em>, this is the first paragraph of the novel:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a></p><blockquote><p>LONDON. Michaelmas term lately over, and the Lord Chancellor sitting in Lincoln&#8217;s Inn Hall. Implacable November weather. As much mud in the streets, as if the waters had but newly retired from the face of the earth, and it would not be wonderful to meet a Megalosaurus, forty feet long or so, waddling like an elephantine lizard up Holborn Hill. Smoke lowering down from chimney-pots, making a soft black drizzle with flakes of soot in it as big as full-grown snowflakes&#8212;gone into mourning, one might imagine, for the death of the sun. Dogs, undistinguishable in mire. Horses, scarcely better; splashed to their very blinkers. Foot passengers, jostling one another&#8217;s umbrellas, in a general infection of ill-temper, and losing their foot-hold at street-corners, where tens of thousands of other foot passengers have been slipping and sliding since the day broke (if this day ever broke), adding new deposits to the crust upon crust of mud, sticking at those points tenaciously to the pavement, and accumulating at compound interest.</p></blockquote><p>I don&#8217;t know about you, but for me the first sentence, a mere 13 words, contains 7 that I&#8217;d have to look up (&#8220;Michaelsmas term&#8221;, &#8220;Lord Chancellor&#8221;, &#8220;Lincoln&#8217;s Inn Hall&#8221;). As a consequence, upon reading the first sentence I immediately lose interest.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-6" href="#footnote-6" target="_self">6</a> Can I read this text if I have to? Yes. Do I want to? No.</p><p>Anyways, I&#8217;m sure American college students are as challenged at reading comprehension as the study claims, but nevertheless the study approach bothers me. When we make students read prose describing a world that is entirely alien to them, using numerous words and concepts they cannot relate to, is it so weird that they might not understand, and more importantly, might not be motivated to make a real effort?<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-7" href="#footnote-7" target="_self">7</a> I&#8217;d very much prefer an assessment of reading comprehension where the material is more contemporary, so the question of whether students can comprehend the writing is separate from the question of whether they have intimate knowledge about living in 19th century England.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Ubn1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Ubn1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Ubn1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg" width="1456" height="856" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:856,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1229709,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/174290323?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Ubn1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 424w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 848w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!Ubn1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F85d5a0f1-2770-408d-ae13-a701f2899b92_4602x2706.jpeg 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@v1d?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Dmytro Vynohradov</a> on <a href="https://unsplash.com/photos/an-orange-and-white-cat-sitting-in-front-of-a-white-door-LLLZiqwG1Y0?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>There&#8217;s one part of the study that truly irritates me. It recounts how a study participant describes a brief excerpt from <em>Bleak House</em>. A quote of the relevant part of the study follows below. (You can read the study in its entirety <a href="https://muse.jhu.edu/article/922346">here.</a>) In the quote, the excerpt is labeled &#8220;Original Text,&#8221; the study participant is labeled &#8220;Subject,&#8221; and the person interviewing the subject is labeled &#8220;Facilitator.&#8221; The paragraph at the end of the quote is the take-away of the study authors.</p><blockquote><p><em><strong>Original Text: </strong>On such an afternoon, if ever, the Lord High Chancellor ought to be sitting here&#8212;as here he is&#8212;with a foggy glory round his head, softly fenced in with crimson cloth and curtains, addressed by a large advocate with great whiskers, a little voice, and an interminable brief, and outwardly directing his contemplation to the lantern in the roof, where he can see nothing but fog.</em></p><p><em><strong>Subject: </strong>Describing him in a room with an animal I think? Great whiskers?</em></p><p><em><strong>Facilitator: </strong>[Laughs.]</em></p><p><em><strong>Subject: </strong>A cat?</em></p><p>Note that the subject, who is not accessing any of the concrete details in the passage, finds a subject (the Lord Chancellor) and one recognizable word, [End Page 9] &#8220;whiskers,&#8221; and concludes that the character is in a room with a cat. At this point, she does not seem to understand what she is reading, and so she links a few words together to form some kind of response.</p></blockquote><p>I find this interpretation of the subject&#8217;s response condescending. Assume you know nothing about <em>Bleak House</em> but otherwise have good reading comprehension. Now re-read the quoted text. Is it so weird to read &#8220;whiskers&#8221; and think about a cat? In particular, imagine you&#8217;re a contemporary student, who may have grown up with material such as The Wizard of Oz, Wicked, or Harry Potter, where animals take on major roles in the story development. Really put yourself into the position of the subject. You&#8217;re reading a weird book, about a weird world that doesn&#8217;t seem anything like your own, but that maybe reminds you of fantasy movies or similar material you have been exposed to. Now you read &#8220;addressed by a large advocate with great whiskers&#8221; and maybe you go &#8220;well, I don&#8217;t really understand this, but let&#8217;s assume in this world animals can talk and work as lawyers, and so maybe this advocate here is a cat.&#8221; I think this is a perfectly reasonable conclusion to draw.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-8" href="#footnote-8" target="_self">8</a></p><p>I&#8217;m sure most English majors consider reading something like <em>Bleak House</em> purely as a study assignment. Not something anyone does for fun. And how can you blame them? Just because it&#8217;s a classic doesn&#8217;t mean it&#8217;s enjoyable. Who decided that if you&#8217;re interested in the English language you also must like the culture and way of thinking of 19th century England?<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-9" href="#footnote-9" target="_self">9</a></p><p>In my opinion, if we want students to read more, we need to introduce them to texts they can actually relate to and enjoy. I don&#8217;t know what those would be, that&#8217;s outside my area of expertise. But I personally have read plenty of books by different authors and there have been many that grabbed me, and yet I&#8217;ve never been able to make it even half-way through one of the 19th century classics. I&#8217;m just not interested in the worlds they describe. I can&#8217;t blame US high school or college students for feeling similarly.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Hemingway, I have to assume, was simply ahead of the curve when it comes to writing simplistic modern English for an audience that can no longer handle complex sentence structures.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>To be fair to the author, the figure about the number of words per sentence shows the full distribution and variance is not increasing in this case. Though as a statistician I&#8217;d immediately want to know whether Simpson&#8217;s paradox may be at play. Are the types of books that are included in the analysis similar over time or are there major shifts we would want to investigate?</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Carlson et al. (2024) They Don&#8217;t Read Very Well: A Study of the Reading Comprehension Skills of English Majors at Two Midwestern Universities. <em>CEA Critic</em> 86:1&#8211;17. <a href="https://dx.doi.org/10.1353/cea.2024.a922346">doi:10.1353/cea.2024.a922346</a>.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>The study acknowledges this: &#8216;According to Wolfgang Iser in <em>The Act of Reading</em>, one&#8217;s ability to read complex literature is partly dependent on one&#8217;s knowledge of what he calls the &#8220;repertoire&#8221; of the text, &#8220;the form of references to earlier works, or to social and historical norms, or to the whole culture from which the text has emerged&#8221;. [&#8230;] With <em>Bleak House</em>, this knowledge is crucial.&#8217; (Carlson et al., 2024)</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>You can read <a href="https://www.gutenberg.org/ebooks/1023">the entire book here right now,</a> on whatever screen you&#8217;re consuming this post. Such is the power of the modern internet.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-6" href="#footnote-anchor-6" class="footnote-number" contenteditable="false" target="_self">6</a><div class="footnote-content"><p>I&#8217;ve never read the book, but I have watched the fifteen-part BBC television drama serial adaptation. It was dreary. And as you can see, I know fancy English words such as &#8220;dreary.&#8221; I should probably have remembered the Lord Chancellor but have to confess I didn&#8217;t.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-7" href="#footnote-anchor-7" class="footnote-number" contenteditable="false" target="_self">7</a><div class="footnote-content"><p>Keep in mind that students may be doing poorly in the study either because they genuinely can&#8217;t read well or because they think the entire thing is ridiculous and can&#8217;t be bothered to make an effort. It&#8217;s difficult to tell these two causes apart.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-8" href="#footnote-anchor-8" class="footnote-number" contenteditable="false" target="_self">8</a><div class="footnote-content"><p>I found an illuminating blog post <a href="https://dralun.wordpress.com/2017/06/21/what-about-whiskers-the-forgotten-facial-hair-fashion-of-19th-century-britain/">about whiskers on humans.</a> Apparently it was all the rage in 19th century England. However, I believe US English majors can be forgiven for not having read this post.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-9" href="#footnote-anchor-9" class="footnote-number" contenteditable="false" target="_self">9</a><div class="footnote-content"><p>I genuinely would rather read a book on English grammar or copy editing than <em>Bleak House.</em> This is a good one: <a href="http://ttps://www.amazon.com/Line-How-Edit-Your-Writing/dp/0395393914">Claire Kehrwald Cook (1985) </a><em><a href="http://ttps://www.amazon.com/Line-How-Edit-Your-Writing/dp/0395393914">Line by Line: How to Edit Your Own Writing.</a> </em>So is this one: Lyn Dupre (1998) <em><a href="https://www.amazon.com/BUGS-Writing-Revised-Guide-Debugging/dp/020137921X">Bugs in Writing: A Guide to Debugging Your Prose.</a></em></p></div></div>]]></content:encoded></item><item><title><![CDATA[Executive function revisited]]></title><description><![CDATA[Readers weigh in.]]></description><link>https://blog.genesmindsmachines.com/p/executive-function-revisited</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/executive-function-revisited</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Fri, 19 Sep 2025 19:30:11 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!SLA5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In response to my recent post <a href="https://blog.genesmindsmachines.com/p/what-is-executive-function">about executive function,</a> several people have reached out to me to provide feedback. I thought it might be useful to have this feedback in one place, so I have collected it here. If you haven&#8217;t read my earlier post, I suggest you <a href="https://blog.genesmindsmachines.com/p/what-is-executive-function">read it first.</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!SLA5!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!SLA5!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!SLA5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1141546,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/173946340?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!SLA5!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 424w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 848w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!SLA5!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7cf81627-73b6-450f-9e85-f9a1d76fde63_4896x3264.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@megjenson?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Meg Jenson</a> on <a href="https://unsplash.com/photos/a-table-with-a-book-and-a-pen-on-it-ia3b0O996D0?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>Evan Dorn writes <a href="https://www.linkedin.com/feed/update/urn:li:activity:7370893494179184640?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7370893494179184640%2C7374086623325642752%29&amp;dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287374086623325642752%2Curn%3Ali%3Aactivity%3A7370893494179184640%29">on LinkedIn:</a></p><blockquote><p>I suffer from intense ADHD, which wasn&#8217;t diagnosed until just a couple of years ago. And even then only because the diagnostic criteria have changed &amp; matured substantially since I was first tested ~2002. </p><p>I battle with executive function every minute of every day, and until I understood the cause and received treatment, I lived with decades of self-loathing thinking I was just a failure because I struggled with &#8220;simple&#8221; tasks that almost everyone takes for granted. In retrospect, it&#8217;s a bit of a miracle that I completed any of my degrees, much less my PhD. But compensating mechanisms learned through trial and error got me over the line, even though many of them are quite unpleasant when experienced from the inside. </p><p>Today I&#8217;ve got a much better understanding of how my own brain works, as well as greatly improved therapeutic help, and both make a huge difference. But I do wish more people had your level of empathy for conditions that include executive dysfunction. In a world that judges people&#8217;s moral worth by their productivity &#8211; even to the point of whether they deserve to have a decent standard of living &#8211; it can be an extremely unpleasant way to live.</p></blockquote><p>One person has reached out to me by email. I&#8217;m quoting them here anonymously:</p><blockquote><p>ADHDers in grad school need things not to be boring. You&#8217;re right about too big of a task being difficult to start, so coaching them on how to find the first step is good. But the small tasks might sometimes be boring and ADHD brains can&#8217;t focus and engage in a boring task. Sometimes boring things have to get done still, so strategies like music, color and body doubling help.</p><p>Smart ADHDers are also really afraid of failure. They don&#8217;t trust themselves to be good enough and are always right on the edge of failing (in their minds) so they&#8217;d rather procrastinate a task than risk failing at it. Forcing them to confront that fear and overcome it is important.</p></blockquote><p>Finally, Tom Devitt <a href="https://open.substack.com/pub/clauswilke/p/what-is-executive-function?utm_campaign=comment-list-share-cta&amp;utm_medium=web&amp;comments=true&amp;commentId=156455623">comments under the article:</a></p><blockquote><p>Claus, fellow professor here. I know you&#8217;re trying to help, and I&#8217;m saying this with all due respect: parts of your post land as demeaning and shaming to the very students you want to support. I suggest a different playbook.</p><p>Improving executive function doesn't boil down to &#8220;doing what you&#8217;re told, but smaller.&#8221; For many neurodivergent students (PhD candidates and undergraduates alike), the bottlenecks are time perception, salience, and working memory. Time blindness means the world collapses into two buckets: now and not-now. &#8220;Read one paper and write a paragraph&#8221; feels trivial, so it slides into not-now until a hard trigger makes it suddenly now, usually too late. Working-memory limits add an object-permanence effect: if the task isn&#8217;t literally in front of me&#8212;almost always better in analog rather than digital form&#8212;it may as well not exist. On top of that, ADHD is an interest-based nervous system: activation turns on with novelty, challenge, degree of interest, or immediate social consequence. That&#8217;s why students often jump to the most challenging part first; paradoxically, it&#8217;s the one that finally flips the &#8220;on&#8221; switch.</p><p>This is also why &#8220;break it into simpler tasks&#8221; often backfires. Shrinking scope without changing time, salience, or visibility lowers meaning (&#8220;trivial &#8594; not-now&#8221;), multiplies context switches, and creates shame loops (&#8220;I couldn&#8217;t even do the small thing&#8221;). If you want different results, change the environment, not just the size of the chores.</p><p>There&#8217;s also a real upside to neurodiversity that I think you&#8217;re missing. Many neurodivergent individuals are big-picture thinkers, capable of rapid association, systems-level synthesis, and deep hyperfocus when their level of interest or novelty is high. There are plenty of highly successful, neurodivergent people out there who just have different wiring, which has real advantages in the proper context.</p><p>Also, no need to reinvent the wheel. You&#8217;re not a psychologist, and you don&#8217;t need to be. Neurodivergent students benefit from licensed coaching and evidence-based supports. At UT, Longhorn TIES provides neurodiversity training for faculty and staff, as well as direct student support. Take the training and point students there: <a href="https://longhornties.utexas.edu/">https://longhornties.utexas.edu/</a></p><p>Some minor changes in mentoring neurodivergent students can go a long way. Instead of &#8220;I simplified the tasks and they still failed,&#8221; try &#8220;Let&#8217;s co-design conditions that make time visible, tasks present, and outcomes meaningful.&#8221; That framing maintains dignity and enhances performance.</p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How do bicycles work?]]></title><description><![CDATA[Conservation of angular momentum plays a major role, but why is angular momentum conserved?]]></description><link>https://blog.genesmindsmachines.com/p/how-do-bicycles-work</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/how-do-bicycles-work</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Tue, 16 Sep 2025 12:36:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!KuMQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I grew up in Germany, where I received an excellent education in math and physics. In fact, I routinely observe that I learned mathematical concepts in high school that STEM PhD students in the US don&#8217;t necessarily know.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> I went on to pursue both an undergraduate degree and a PhD in theoretical physics, where I learned even more math and physics. I have since worked for many years as a scientist and professor in the US, and I have found that my German education has served me well. It&#8217;s rare that I encounter a topic in math or physics that I don&#8217;t have a basic understanding of that I can trace back to my high school and college education. I even learned about multi-layer perceptrons and backpropagation back in the late 1990s, topics that in 2025 I teach and use daily.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KuMQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KuMQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KuMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg" width="2216" height="1440" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1440,&quot;width&quot;:2216,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:594502,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/173631156?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa4518aad-8f5b-4a25-be32-a51363a6d766_2560x4888.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KuMQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 424w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 848w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!KuMQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F14c9165b-6f16-4760-b8f6-77cffff16e48_2216x1440.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@simonbro16?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Simon Breau</a> on <a href="https://unsplash.com/photos/a-man-riding-a-bike-down-a-curvy-road-fM-ie28Bb_g?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>And yet, there is one important concept that I was never taught, and that I only learned in my fifties. This concept is related to how bicycles work, so let&#8217;s talk about this briefly.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-3" href="#footnote-3" target="_self">3</a> The common explanation for bicycles is conservation of angular momentum: The wheels rotate, and the angular momentum associated with this rotation conveys stability to the wheels and in turn to the bicycle as a whole.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/how-do-bicycles-work?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/how-do-bicycles-work?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/how-do-bicycles-work?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>If you try to topple a rotating wheel, the wheel instead experiences a torque at a 90-degree angle to the direction of the toppling force. Therefore, instead of falling over, bicycles will turn into the direction of the fall. And so, balancing a bicycle becomes a simple task of keeping the bike pointed into a forward direction. As long as you keep the handle bars straight the bike won&#8217;t topple. I don&#8217;t want to go into more detail of how exactly all of this works, but suffice to say the root cause is conservation of angular momentum. If you&#8217;d like to know more, read up on the concept of <em><a href="https://en.wikipedia.org/wiki/Precession#Torque-induced">precession.</a></em><a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-4" href="#footnote-4" target="_self">4</a> You can see a demonstration of this effect in this short video: </p><div id="youtube2-jhwvCKrUq9U" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;jhwvCKrUq9U&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/jhwvCKrUq9U?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>There&#8217;s more to bicycle stability than angular momentum. If you search the scientific literature, you find all sorts of research articles on this topic.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-5" href="#footnote-5" target="_self">5</a> But here, I&#8217;m interested in a deeper, more fundamental question: Why is angular momentum conserved? In my entire high school and college physics education, I was never provided with a reason. It was just a given. Angular momentum is conserved. Just like energy, linear momentum, mass, electric charge, etc. All sorts of things are conserved, and nobody can tell us why.</p><p>Well that&#8217;s not quite true. First of all, not all of these quantities are truly conserved. And second, physicists have known about the origin of these conservation laws for nearly a century. But for some reason, it was absent from my education. I never learned about it. That is, until a couple of months ago, when I watched a video by the YouTube science channel Veritasium.</p><div id="youtube2-lcjdwSY2AzM" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;lcjdwSY2AzM&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/lcjdwSY2AzM?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p>The video is titled &#8220;The Biggest Misconception in Physics,&#8221; and the thumbnail states in large font that &#8220;Energy is NOT conserved.&#8221; Both the title and the hook are a bit click-baity, even if technically true. And yet, the video is very much worth watching. We learn that conservation laws are due to symmetries. Conservation of angular momentum follows directly from the assumption (or should I say observation?) that the laws of physics are the same in every direction. Similarly, conservation of energy follows from time invariance. If the laws of physics are unchanged as time passes, then energy has to be conserved. Curiously, these invariances do not necessarily hold under general relativity, which therefore implies that energy, momentum, etc. don&#8217;t have to be conserved at sufficiently large spatial scales and/or sufficiently long times.</p><p>We understand today that symmetries are related to conservation laws, but this was not known when Einstein was developing his theory of general relativity. He knew that in his theory energy was not conserved, and he considered this to be a problem. He thought he had to find a modification of the equations so that energy would be conserved. But he didn&#8217;t know how.</p><p>The person who sorted all this out was female mathematician <a href="https://en.wikipedia.org/wiki/Emmy_Noether">Emmy Noether.</a> She realized that conservation laws were due to symmetries, and therefore if the symmetries were broken conservation laws didn&#8217;t have to apply. Energy is conserved in an empty universe, but the moment you put stuff into the universe, and the stuff starts deforming space-time, symmetry is broken and energy is no longer fully conserved. The problem was not with Einstein&#8217;s equations, but instead with the expectation that energy should be conserved.</p><p>Noether&#8217;s influence on modern physics is difficult to overstate. Modern physics, in particular quantum field theory, is all about symmetries. The symmetries in the field equations determine what elementary particles are possible and how they interact. And yet, Noether is not that well known. I had heard of the name Emmy Noether before I watched the Veritasium video, but if you had asked me what her contribution was I wouldn&#8217;t have been able to tell you. And similarly, I knew about the importance of symmetries in field theories, but I didn&#8217;t know that similar concepts lead to basic conservation laws of energy and momentum, and that Noether had first pointed out this connection.</p><p>Of course, all of this reflects my own ignorance. Many theoretical physicists are fully aware of Noether&#8217;s contributions to mathematical physics and her explanation of the relationship between symmetries and conservation laws. But I think I should have learned this in undergraduate, and I didn&#8217;t. If somebody had ever said &#8220;energy is conserved because time is invariant&#8221; I am certain I would remember. After some reflection, I have a sense of what may have caused this gap in my education. Noether&#8217;s contributions are considered advanced topics in mathematical physics, and so they are often covered only in specialized graduate classes. And even when they are covered, they are frequently described in abstract mathematical terms that obscure their importance for basic physical concepts. I may even have encountered Noether&#8217;s theorem at some point in my education and not realized its importance, and thus not remembered it. As a case in point, read the <a href="https://en.wikipedia.org/wiki/Noether%27s_second_theorem">Wikipedia page on Noether&#8217;s second theorem</a> and tell me whether the page conveys how this theorem has shaped our understanding of modern physics. To be fair, <a href="https://en.wikipedia.org/wiki/Noether%27s_theorem">the page on Noether&#8217;s first theorem</a> is a bit better.</p><p>Since watching the Veritasium video, I have discovered a few other videos where Noether&#8217;s work has been mentioned. This is the typical pattern where once you&#8217;re aware of something you see it everywhere. As one example, here is a video by Angela Collier where she mentions Noether&#8217;s work. (Go to <a href="https://www.youtube.com/watch?v=9FiMCdEQhMI&amp;t=3857s">1:04:17 in the video.</a>) Clearly Angela Collier&#8217;s physics education was better than mine.</p><div id="youtube2-9FiMCdEQhMI" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;9FiMCdEQhMI&quot;,&quot;startTime&quot;:&quot;3857s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/9FiMCdEQhMI?start=3857s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p><em>This post was inspired by <a href="https://substack.com/@clauswilke/note/c-155632941">a conversation on Substack Notes,</a> though that conversation is only tangentially related to this post. I&#8217;d like to acknowledge Billt for <a href="https://substack.com/@wctetley/note/c-155719629?utm_source=notes-share-action&amp;r=125478">pointing me to the research conducted by Kooijman et al. on what makes bicycles stable.</a></em></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Among my US students, I see huge gaps in <a href="https://en.wikipedia.org/wiki/Linear_algebra">linear algebra,</a> a topic we covered at great length in my high school math classes. I also rarely encounter students who know any <a href="https://en.wikipedia.org/wiki/Complex_analysis">complex analysis.</a> (However, that&#8217;s a topic I learned mostly in my first year in undergraduate, if I recall correctly.) Maybe things would be different if I worked more with students with a physics or engineering background as opposed to a biology or biochemistry background.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>If these terms don&#8217;t mean anything to you, suffice to say ChatGPT wouldn&#8217;t exist without multi-layer perceptrons and backpropagation.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-3" href="#footnote-anchor-3" class="footnote-number" contenteditable="false" target="_self">3</a><div class="footnote-content"><p>Despite the title, this post is not actually about how bicycles work.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-4" href="#footnote-anchor-4" class="footnote-number" contenteditable="false" target="_self">4</a><div class="footnote-content"><p>Things get even more interesting in motorcycles, which due to their higher speed and weight carry much more angular momentum than bicycles do. If you want to steer a motorcycle, you have to&#8212;somewhat unintuitively&#8212;push the handlebars into the opposite direction. This is called push steering or counter steering. <a href="https://www.youtube.com/watch?v=xNvdB6pMdx0">The push initiates a lean in the motorcycle,</a> which then causes the motorcycle to turn into the lean. It&#8217;s all an elaborate manipulation of the laws of conservation of angular momentum.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-5" href="#footnote-anchor-5" class="footnote-number" contenteditable="false" target="_self">5</a><div class="footnote-content"><p>This is an interesting one: Kooijman et al., A bicycle can be self-stable without gyroscopic or caster effects, <em>Science</em> 332:339&#8211;342, 2011. You can <a href="https://arendschwab.com/assets/pdf/kooijman2011bicycle.pdf">read the pdf here.</a></p><p></p><p></p></div></div>]]></content:encoded></item><item><title><![CDATA[Teaching data visualization in the time of generative AI]]></title><description><![CDATA[If students want to cheat with AI, the least I can do is make them own up to it.]]></description><link>https://blog.genesmindsmachines.com/p/teaching-data-visualization-in-the</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/teaching-data-visualization-in-the</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Fri, 12 Sep 2025 13:00:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!QEYO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There are dark clouds on the horizon for <a href="https://wilkelab.org/SDS366/">my class on data visualization.</a> I enjoy teaching this topic, but the entire setup for my class is being upended by generative AI. I no longer know how to create meaningful assignments. How can I assess whether students have learned anything when they can complete entire data analysis projects with a quick request to ChatGPT? </p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!QEYO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!QEYO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!QEYO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg" width="1456" height="821" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:821,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:573149,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/172827783?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!QEYO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 424w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 848w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!QEYO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4cd98187-f58a-4f17-a15c-47a67726c655_4000x2256.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@nahrizuladib?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Nahrizul Kadri</a> on <a href="https://unsplash.com/photos/a-sign-with-a-question-mark-and-a-question-mark-drawn-on-it-OAsF0QMRWlA?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>My class is about hands-on working with and visualizing data. It inherently requires that students work on assignments at a scale of multi-day projects. Students need to familiarize themselves with a dataset, perform some exploratory investigations, create visualizations, and then document what they have done and what they can conclude from their analysis. This type of work can only be carried out in the format of a take-home assignment. And it is also something that ChatGPT or Claude Sonnet could crank out in minutes in response to a simple prompt.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/teaching-data-visualization-in-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/teaching-data-visualization-in-the?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/teaching-data-visualization-in-the?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>In reaction to the rise of generative AI, some teachers have transitioned their assessments to either in-class exams written by hand or oral assessments conducted in person. Neither approach works for me. The main learning outcome for my class is the ability to produce a compelling report that combines written text, computer code, and visualizations of the dataset analyzed. This skill cannot be tested with a hand-written, hour-long exam. An oral exam could make sense&#8212;I could ask students to walk me through their analysis and explain step by step what they did and why&#8212;but oral exams don&#8217;t scale. I have a hundred students in my class. I cannot possibly examine every single one of them in person.</p><p>Until recently, I felt like giving up. The only way forward I saw was to ignore the issue, teach the class as I always have, and let the dice fall where they may. Ideally, students would work on their assignments with minimal reliance on generative AI, because I would ask them to do just that. But of course I would not have any means of enforcing this behavior. And I don&#8217;t believe in AI detectors. The last thing I want to do is argue with students over whether or not they have employed AI to complete a given assignment.</p><p>However, last week I had an idea that hopefully will make the AI situation a bit more bearable. At a minimum, it will require the students to reflect on their AI use. Starting next spring,<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> I will ask the students to declare, at the end of each assignment, how they used AI models in the preparation of their work. This idea is inspired by similar policies now put in place by machine learning conferences, see for example <a href="https://neurips.cc/Conferences/2025/LLM">here for NeurIPS.</a> I can&#8217;t prevent students from turning off their brain and letting the AI do all the &#8220;thinking,&#8221; but I can make them document what they did. This should prompt them to reflect on how they&#8217;re engaging with AI. And because in my class students peer-grade each others&#8217; assignments, I hope it may make them feel uncomfortable having to admit to their peers that their entire assignment was written by AI. Maybe this will convince them to use AI a little less.</p><p>Of course all of this is based on the honor system. A student who intends to lie and deliberately obscure that their assignment was AI generated has that option. There&#8217;s not much I can do about that. But then, even without AI, students can ask their best friend or brother or hired help to write their essays and pretend it was their own work. Students who want to cheat will always find a way. There&#8217;s little I can do to prevent every such possibility. I see my job as teaching to the students that actually want to learn. My primary goal is to create a learning environment where students are motivated and inspired to make an honest effort. I hope this new AI policy will help me achieve this goal.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>I don&#8217;t actually think these AI-generated analyses and visualizations are very good, but they&#8217;re usually good enough to get a decent grade in my class. At a minimum, they tend to be better than what the weakest students in the class produce.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>That&#8217;s when I&#8217;ll teach the class again.</p></div></div>]]></content:encoded></item><item><title><![CDATA[What is executive function?]]></title><description><![CDATA[In a recent post about the abilities and personality traits required to complete a PhD program, I mentioned the concept of executive function. From some of the responses I received, I gather that it&#8217;s an obscure and poorly understood concept that probably deserves its own post.]]></description><link>https://blog.genesmindsmachines.com/p/what-is-executive-function</link><guid isPermaLink="false">https://blog.genesmindsmachines.com/p/what-is-executive-function</guid><dc:creator><![CDATA[Claus Wilke]]></dc:creator><pubDate>Mon, 08 Sep 2025 18:57:59 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!bhZ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a recent post about the <a href="https://clauswilke.substack.com/p/phd-level-abilities-and-character">abilities and personality traits required to complete a PhD program,</a> I mentioned the concept of <em>executive function.</em> From some of the responses I received, I gather that it&#8217;s an obscure and poorly understood concept that probably deserves its own post. So here we go.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bhZ2!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bhZ2!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bhZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:542051,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://blog.genesmindsmachines.com/i/172745050?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bhZ2!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 424w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 848w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!bhZ2!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F903955cb-295a-4ccb-84da-0e30b3e2a235_3552x2368.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Photo by <a href="https://unsplash.com/@megjenson?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Meg Jenson</a> on <a href="https://unsplash.com/photos/a-kitchen-counter-with-a-basket-of-bananas-and-a-basket-of-bread-1zra6CxbQpM?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></figcaption></figure></div><p>To get the most important issue out of the way: <em>Executive function</em> does not mean <em>ability to perform like an executive. </em>Instead, it&#8217;s the ability, broadly speaking, to organize everyday tasks and to function in the world; the capacity to get done what you need to get done when you need to do so. I could write this out in detail, but instead I&#8217;ll quote you this AI overview written by Google Gemini, which is very much on point:<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a></p><blockquote><p>Executive function refers to <strong>the mental skills and cognitive processes that allow people to plan, organize, manage time, control impulses, and achieve goals</strong>, essentially acting as the &#8220;management system of the brain.&#8221; Key components include working memory, cognitive flexibility, and inhibition. These skills are crucial for daily tasks, work, and social interactions, and are located in the brain's frontal lobe.</p><p><strong>What Executive Function Does</strong></p><ul><li><p><strong>Manages everyday tasks:</strong> It helps you plan and execute plans to get things done.</p></li><li><p><strong>Problem-solving:</strong> It guides your ability to figure out and solve problems.</p></li><li><p><strong>Attention:</strong> It allows you to focus and shift attention when needed.</p></li><li><p><strong>Emotional control:</strong> It helps in managing and regulating emotions.</p></li></ul><p><strong>Key Components of Executive Function</strong></p><ul><li><p><strong>Working Memory: </strong>The ability to hold and manipulate information in your mind.</p></li><li><p><strong>Cognitive Flexibility: </strong>The ability to adapt your thinking and behavior to new situations and demands.</p></li><li><p><strong>Inhibition: </strong>The ability to control impulses and stop inappropriate responses.</p></li></ul><p><strong>Why It Matters</strong></p><ul><li><p><strong>Daily living: </strong>Good executive function helps you manage your daily life, including organizing, prioritizing, and staying on task.</p></li><li><p><strong>Success in school and work: </strong>These skills are vital for learning, working independently, and completing large projects.</p></li><li><p><strong>Mental well-being: </strong>Strong executive functioning contributes to a better quality of life and can help prevent issues like anxiety and depression.</p></li></ul><p><strong>When Executive Function is Challenged</strong></p><ul><li><p><strong>ADHD: </strong>Attention-Deficit/Hyperactivity Disorder (ADHD) is often considered a disorder of executive functioning.</p></li><li><p><strong>Daily struggles: </strong>People with impaired executive function may have difficulty focusing, following directions, managing time, and controlling impulses.</p></li><li><p><strong>Supports: </strong>Strategies like using lists, creating routines, and organizing the environment can help individuals manage these challenges.</p></li></ul></blockquote><p>If you don&#8217;t have any issues with executive function, you may have read this and come to the conclusion that nearly any functioning adult must have good executive function. However, this is not the case. It is surprisingly common to see adults with some form of executive-function deficit. In particular, people on the Autism spectrum, who may be very intelligent and capable otherwise, frequently struggle with executive function.</p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/what-is-executive-function?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/p/what-is-executive-function?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.genesmindsmachines.com/p/what-is-executive-function?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p>In my work educating PhD students, I have worked with several students who struggled with executive function. This experience has led me to identify certain behavior patterns that are common among such students. One is not preparing sufficiently or appropriately for meetings or presentations. For instance, when such students give lab meeting presentations, their slides often look hastily put together, some parts are missing, slide quality varies widely throughout the slide deck, and there&#8217;s the inevitable sudden ending where the student ran out of time to prepare the final part of the talk. Another is not completing simple tasks I ask them to do. For instance, I may ask a student to read a specific paper and the next time I meet with them they haven&#8217;t read the paper but may have done all sorts of other things instead. I then ask them to please read the paper for our next meeting, and when that meeting comes around they still haven&#8217;t read the paper. And it goes on like this for weeks until eventually neither the student nor I remember why it would have made sense to read the paper in the first place.</p><p>There&#8217;s a curious observation I&#8217;ve made working with people who struggle with executive function. My natural instinct is to requests increasingly simpler tasks, but it often doesn&#8217;t help. For example, I may ask a student to do a literature review on a certain topic, and the next time we meet they haven&#8217;t done it. So then I ask them to start by reading five specific papers. The next time we meet they haven&#8217;t done it. Then I ask them to read just one paper and write a one-paragraph summary on it. The next time we meet they still haven&#8217;t done any of it. Every week, instead, they have done something else entirely. And the simpler and smaller I make the requested task, the less inclined the student seems to be to actually complete it. It&#8217;s a difficult position to be in as an advisor, as the student doesn&#8217;t do well performing large, complex tasks, but also doesn&#8217;t respond well to me trying to simplify things and reduce scope. </p><p>I have a vague notion of what may be happening in these situations: Students look at the entirety of the tasks they need to work on and discount all the items that are easy, that they know they could do relatively quickly if they actually worked on them. In their minds, because those tasks are easy, they are essentially done already. So, instead of doing those tasks, the students focus on the most difficult possible thing they could be working on, and try to make progress on that one.<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> Of course, the most difficult possible thing to work on is usually too difficult for the students to actually accomplish, while the easier tasks don&#8217;t get done because they aren&#8217;t given sufficient attention or started at all. And the simpler I make the tasks I ask students to do, the lower my requests rank in the students&#8217; internal priority ranking of things to tackle.</p><p>If you&#8217;re a student and reading this rings true to you, I&#8217;d like to ask you to consider that even simple tasks need doing and require a finite amount of time for completion. So please be honest with yourself about how long simple tasks will take and don&#8217;t wait until the last minute as you will run out of time. If you&#8217;re an advisor working with students who have some executive-function deficit, I&#8217;m not sure what to recommend. I have not found the ideal approach to managing such students. I still think breaking larger projects into increasingly simpler tasks has to be part of the solution, but maybe it needs to be accompanied by honest discussions about how long individual tasks should take, how to schedule tasks when we can roughly estimate how much time they will require, what the specific reasons are that keep students from tackling certain tasks, and so on.</p><p>To end on a positive note, I&#8217;d like to emphasize that it&#8217;s absolutely possible to be successful in life even with impaired executive function. At least four of my past PhD students have struggled with executive function and all but one of them managed to complete their PhD. Today they are successful in careers appropriate for their training, be it academia or industry. To any of my readers who may have difficulty with executive function, know that recognizing your limitations is the first step towards working around them. You can develop strategies and routines that help you get things done on time and in an organized manner, even if it doesn&#8217;t come naturally. We all have various strengths and weaknesses, and we need to learn how to work around our weaknesses while taking advantage of our own unique strengths.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.genesmindsmachines.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Genes, Minds, Machines! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>Yes, the text in the quote is AI generated. That&#8217;s why it&#8217;s in a quote. I did not write it myself.</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>The reasoning is, presumably, that the most difficult thing the student could work on is the primary barrier that keeps the student from completing the entire project. Once this issue is resolved, the thinking goes, the rest of the project will easily fall into place.</p></div></div>]]></content:encoded></item></channel></rss>