The Google Scholar preprint bug strikes again
Google is never going to fix this bug, are they?
For the last couple of weeks, Google Scholar has been complaining to me that one of my articles is not publicly available, in violation of a funder-imposed public access mandate. When I go to my Google Scholar page, there is a big notification box on the top of the page that asks me to review the situation. This is rather annoying, because (as you will see in a moment) there is nothing I have done wrong. I have done everything the NIH—my funder—wants me to do. The entity that is wrong is Google. In fact, I believe what I’m seeing is a version of the Google Scholar preprint bug, which I’ve reported on for over a decade, see for example here or here.
When I go to the page where I can review the situation, Google Scholar shows me the offending article. It is a preprint from 2026, published on bioRxiv. You can read it here. Yes, Google Scholar complains that a preprint on bioRxiv is not publicly available. But it gets worse.
Technically, the NIH doesn’t just require papers to be available. It wants them to be deposited in PubMed Central. So maybe that’s Google’s beef? That the paper is available on bioRxiv but not on PubMed Central? Well, that’s a neat theory, but it falls flat. It falls flat because the paper is actually on PubMed Central. You can check for yourself here. The NIH has a pilot program where they scan bioRxiv for NIH-funded research and automatically pull any preprints that match their criteria into PubMed Central. This has worked beautifully for all recent preprints my lab has published, and I never think about it because it works so smoothly. Everybody is happy. The NIH, the public, me. Except Google Scholar. They have taken it upon themselves to become open access warriors, and in the process they are now falsely accusing honest researchers of violating open-access mandates.
So what’s going on? Digging a bit deeper, I have a pretty good idea about what the issue is. We’ll get to that in a second. Let’s collect a bit more evidence first.
Do you know how, when you click on the Google Scholar record for an article, it gives you the option to review all the alternative versions of the article?1 Well, for this particular preprint, that’s missing. Google Scholar is not aware of any alternative versions. And, even worse, Google Scholar doesn’t even point to the correct article. Instead of pointing to bioRxiv, it points to Europe PMC. Google Scholar has completely messed up. It doesn’t know that my bioRxiv preprint is on bioRxiv, it doesn’t know that it is on PubMed Central, and it sends people on a wild goose chase to Europe PMC, which then points to bioRxiv.
So what we’re dealing with here is an outdated or less authoritative link that is causing more authoritative links to disappear from the Google Scholar database. Have we ever seen anything like this? I’m glad you asked. Yes we have. It’s the Google Scholar preprint bug, which I have been documenting since 2014. Hundreds of scientists (that I know of) have complained about it, because it can have the unfortunate consequence of removing your published paper from the Google Scholar database. This is particularly frustrating for junior scientists on the job market, because it matters whether Google Scholar is showing your recent Nature paper or just the corresponding bioRxiv preprint.
In 2015, I even discussed it with Anurag Acharya, co-founder of Google Scholar.2 This discussion left me with the impression that the Google Scholar team does not understand the issue, or the severity of it, and will never fix the problem. And here we are, a decade later, the problem still exists, and now it’s causing down-stream consequences such as accusing me of violating the NIH open-access policy.
For completeness, I am reproducing here my 2015 conversation with Anurag Acharya, as it is as relevant today as it was back then.
The link usually says “All n versions,” with n being the number of different versions Google Scholar has found. See here for an example, at the very bottom of the page. As of this writing, it says “All 18 versions.”
My discussion with Anurag Acharya can be found in the comments section to this 2015 article by the Scholarly Kitchen. I’m impressed by the fact that the Scholarly Kitchen is still hosting the comments to a decade-old article.





