If your random seed is 42 I will come to your office and set your computer on fiređ„
Figuratively. More likely you'll get a stern talking to.
When youâre as old as I am, old enough to remember that there was a time before the internet, when you had to go to the library to read a book or drop coins into a metal box to make a phone call, you have absorbed a lot of geek lore. So, when you read some tutorial about machine learning or data analysis and you see random.seed(42)
you go âhaha, thatâs funnyâ and you move on. Until you talk to your much younger students and you realize they all think this is an important line of code that ensures their programs run correctly. They set random seeds to 42 everywhere. They have read the documentation, they know about the random seed option, and they dutifully follow the best practices as laid out everywhere on the internet. The random seed is 42.
I cannot emphasize how bad of a choice this is. 42 was a joke guys. Donât use 42. Ever. If your random seed is 42 I will come to your office and set your computer on fire.1

But is this actually a problem? Do people really use 42 that commonly as their random seed? Yes, absolutely. Google ârandom seedâ or ârandom_stateâ and the number 42 will pop up among your top search hits. And people may explain where the number 42 comes from (weâll get to this below), but then they donât talk much about whether or not this choice is a good idea. In fact, frequently you see statements along the lines of âthe random seed is arbitrary, you can use any number you want, so 42 is a fine choice.â This sentence is 100% correct, assuming youâre the only one who uses 42 and you also use it only once in your entire life. Obviously this assumption is not valid. But I rarely see people point this out.
For example, the documentation for the Python machine-learning framework scikit-learn contains a lot of material about random states and various options of controlling them. Everything the documentation says is technically correct, and yet it never discourages you from using 42 as the seed. In fact, the glossary contains this gem:
Popular integer random seeds are 0 and 42.
(I hope I wonât have to explain why 0 is just as bad a choice as 42.)
The number 42 also shows up throughout the documentation, such as in code examples for the train_test_split()
function. Using a fixed random seed when splitting data into training and test sets is uniquely bad, as youâre always going to be sampling the same split when youâre re-training your classifier.

train_test_split()
function from scikit-learn, prominently setting random_state=42
. Taken from the official documentation for version 1.7.2, the latest stable release as of this writing.But it gets worse. LLMs have learned about 42 and will happily put it into the code they generate. Here is some zero-shot data-analysis code I recently generated. And there it is, random_state=42
(âfor reproducibilityâ).

random_state=42
. Taken from this zero-shot output generated by Claude Sonnet 4.5.It is not surprising that attentive students, who read the documentation, read the blog posts, read the code generated by LLMs, conclude that a random seed of 42 is a good choice, and maybe even a choice that is superior to other options.
To dig deeper into why 42 is not a good choice, and is in fact uniquely bad, we need to look into how random numbers are generated in a computer.
What is a random seed?
In modern computing, we need randomness everywhere. If youâre doing machine learning and you need to subdivide your data into training and test sets, that requires a source of randomness. If youâre writing a computer game and you want your NPCs2 to show somewhat interesting and unpredictable behavior, you need randomness. If youâre simulating a physical system, you need randomness. The problem is that itâs rather complicated to generate true random numbers. The sources of true randomness we have available (for example from thermal fluctuations in specific electronics components) are nowhere fast or cheap enough to generate random numbers at the scale needed in modern computing environments.
The solution computer scientists have come up with is the pseudo-random number generator (PRNG). A PRNG is a mathematical algorithm that produces sequences of numbers statistically indistinguishable from random. Importantly, a PRNG will always produce the exact same sequence of numbers when run from the same starting point. The numbers arenât random at all! But they look random.
So what is the random seed? It is a number that defines the initial state of the PRNG. How exactly we get from the seed to the initial state can be complicated, but the details donât matter here. What matters is the same seed will always give you the exact same sequence of random numbers.
A second important concept to be aware of is the period length of a PRNG. The period length is the number of random values a PRNG can generate before it starts repeating. All PRNGs repeat eventually. Therefore, it is critical that your PRNG has a period length large enough that it never causes you any trouble. Letâs say you write a large numerical simulation (maybe youâre simulating the weather, or the early universe, or all the atoms in a cell) where you need trillions or more of random numbers. You wouldnât want the random numbers to repeat during any of your simulation runs. So you need a PRNG with a period length well in excess of the maximum number of random values you may ever need.
One of the most widely used PRNGs is the Mersenne twister. It has a period length of over 106000. (The exact value is 219937 â 1.) This is an unimaginably large number. To give you a sense of how large it is, for comparison, there are approximately 1080 atoms in the universe. This is tiny compared to 106000. The period of the Mersenne twister has space for entire universes for every single atom in the universe, and then some. In fact, you could create an entire universe for every atom, and then create another entire universe for every atom in every of the universes you have created, and keep nesting 75 times, and still you wouldnât run out of room in the period of the Mersenne twister. If you used the Mersenne twister to create nested universes 75 times deep, all these universes inside universes inside other universes would be different from each other.
What is a good choice for your random seed?
As I wrote above: The random seed is arbitrary. You can pick any seed you want. There are no better or worse seeds. (Unless you have a bad PRNG, but letâs ignore this complication.) In principle we could stop here. But in practice itâs a little more complicated.
While the seed is arbitrary, you donât ever want to reuse a seed. The point of a PRNG is that its output is statistically indistinguishable from random. Thatâs going to be the case if youâre using a different seed every time. But if youâre reusing seeds, suddenly you have hidden correlation structures. And you may not even be aware of them.
The consequences of reusing random seeds could be benign or disastrous. It depends on the specifics of the situation. Letâs say youâre doing machine learning, and youâre using the train-test splitting code I quoted above, with a fixed random seed. In this case, youâre always splitting the data in exactly the same way. If youâre running this code five times, youâre not actually getting five independent splits, youâre getting the same split five times. At a minimum, thatâs going to be wildly underestimating the variance in the performance of your fitted model. And worse consequences are possible if youâre unlucky.
If youâre following so far, and youâre starting to see that reusing random seeds can be bad, you may also realize that double-digit random seeds are bad. There are only 90 different options. If youâre in any way regularly working with random processes, youâll easily need 90 different options in just a few days.
So, letâs go wild. Letâs use an 8-digit random seed.3 Surely thatâll give us sufficiently many different possibilities for a lifetime. Well, once you ponder it a bit, youâll see that this doesnât even give us a separate random sequence for every person in the United States. (The US population is approximately 340 million.) If they were all doing data science, splitting data into training and test, many of them would be using the exact same ârandomâ splits.
The space you can possibly explore with even quite a long random seed is tiny compared to the total number of sequences you would want to have available to you, and which a PRNG such as the Mersenne Twister would certainly support.4 And the random seed 42 is uniquely bad precisely because everybody is using it. The Mersenne Twister has a state space large enough for universes within universes, but every data scientist in the entire world is using the same 10,000 ârandomâ numbers that you get when starting with seed 42.
So weâre all on the same page, here are the first 50 of the âofficialâ random numbers. They have been used millions of times. I encourage you to verify you get the same numbers on your computer.
>>> import random
>>> random.seed(42)
>>> [random.random() for i in range(50)]
[0.6394267984578837, 0.025010755222666936, 0.27502931836911926, 0.22321073814882275, 0.7364712141640124, 0.6766994874229113, 0.8921795677048454, 0.08693883262941615, 0.4219218196852704, 0.029797219438070344, 0.21863797480360336, 0.5053552881033624, 0.026535969683863625, 0.1988376506866485, 0.6498844377795232, 0.5449414806032167, 0.2204406220406967, 0.5892656838759087, 0.8094304566778266, 0.006498759678061017, 0.8058192518328079, 0.6981393949882269, 0.3402505165179919, 0.15547949981178155, 0.9572130722067812, 0.33659454511262676, 0.09274584338014791, 0.09671637683346401, 0.8474943663474598, 0.6037260313668911, 0.8071282732743802, 0.7297317866938179, 0.5362280914547007, 0.9731157639793706, 0.3785343772083535, 0.552040631273227, 0.8294046642529949, 0.6185197523642461, 0.8617069003107772, 0.577352145256762, 0.7045718362149235, 0.045824383655662215, 0.22789827565154686, 0.28938796360210717, 0.0797919769236275, 0.23279088636103018, 0.10100142940972912, 0.2779736031100921, 0.6356844442644002, 0.36483217897008424]
Should you explicitly set your random seed?
Why choose a specific random seed at all? Is this actually a good idea? In general, I think the answer is no. In my opinion, youâre typically better off using a random5 random seed and sampling a broader space of possibilities than picking your own random seed and risking that youâre drawing invalid conclusions from your analysis. However, there are of course specific situations in which picking a random seed is appropriate, or even required. Letâs discuss those.
Most importantly, in simulation studies, it can be helpful to start all simulations with a different but defined random seed, so that every single simulation run can be reproduced if necessary. In these types of situations, a good strategy is to take some arbitrary large integer number, say 9427385,6 and then add to it the number of the replicate youâre running. So, simulation i would have seed 9427385 + i. This requires a bit of work to get right, as you should never reuse any random seeds among simulations, but it can be useful for tracking down weird behaviors that may occur only occasionally.
Related to this point, when youâre coding a complex stochastic simulation, you may encounter bugs that happen only very rarely. Your simulation may work just fine most of the time, but every few thousand runs or so it crashes. This type of bug can be difficult to investigate. The first step is usually to identify a random seeds that reliably triggers the bug. If you can find a seed that triggers it early in the simulation run thatâs even better.
Another scenario in which you might want to use defined random seeds is when you have random choices that you explicitly want to reuse multiple times. For example, if youâre doing machine learning, and youâre comparing two different models, you may want to fit them to the exact same collection of training/test splits. In this situation, you could, for example, pick ten random seeds, use each to generate one training/test split, and fit each model to each of the ten splits.
Finally, when making visualizations that contain random scatter or other elements that are randomly chosen, it can be helpful to play around with the random seed until the scatter looks pleasing. When I wrote my book on data visualization I used this technique quite frequently, for example in this chapter.
Where does 42 come from anyways?
Now that you know everything there is to know about random seeds, letâs go back to the number 42. Where does it come from? Why do people use this number in particular? The culprit is Douglas Adamsâ The Hitchhikerâs Guide to the Galaxy, a humorous, quirky science fiction novel. The book was very popular in the 1980s and 1990s, was adapted several times for radio and TV, and reached broad audiences around the world. If youâve never read it or seen any of the adaptations, I would encourage you to check it out. Just be prepared for rather strange humor.7
In the book, we learn about some hyper-intelligent, pan-dimensional beings that built a massive computer called Deep Thought, with the specific purpose to discover the answer to âthe ultimate question of life, the universe, and everything.â After several million years of computing, Deep Thought reveals that the answer is 42. When the beings donât understand what to do with this answer, Deep Thought tells them that to understand the answer they have to figure out what exactly the question is.8 You can see a film adaptation of this scene here.
The number 42 has since turned into a meme. When somebody asks an extremely broad question, or a question that goes after deep philosophical topics such as the meaning of life, people like to respond with 42. This meme is so popular it has its own Wikipedia article. This is all very funny,9 but none of this makes 42 a good random seed.
Updates
This post generated a number of good responses. I will collect the most helpful or interesting ones here.
First, Nick Bailey pointed out a good solution to the randomness/reproducibility conundrum (where reusing seeds ruins randomness but using true random starting points ruins reproducibility): Generate a genuinely random random seed when you start up your code, write it into a log file, and then seed your random number generator. This gives you the best of both worlds.
A somewhat similar idea: Use the current date as random seed. This at a minimum gives you a different seed each day, while also avoiding the problem of appearing to have fished for the seed that gives you desired results. The devil is in the details though of how exactly you convert the date into a seed. See this post by Stephen Turner and the subsequent replies:
Thanks to Colin Gillespie over on BlueSky, I have learned how to do code searches on GitHub. So now I can report that, as of this writing, there are 496k cases of random_state=42
on GitHub.
Finally, an issue to be aware of if youâre using Matlab: It uses a fixed random seed every time, so the generated random numbers are always the same in a fresh Matlab session:
You can read more about this on the Matlab blog.
I am channeling Jenny Bryan here, who made similar statements about some widely popular, bad practices in R coding.
NPC = Non-player character. NPCs are all the elements of a game that do something autonomously, not directed by a human playing the game. For example, the monsters are usually NPCs.
I.e., any integer between 10,000,000 and 99,999,999. There 90 million possible choices in this range of numbers.
Most programming languages limit you to 32-bit integers as seed values. Thatâs equivalent to 4.3 billion different options, not even enough to give every living person on the planet their own sequence of random numbers.
For most programming languages, if you donât specify a random seed, the language uses true randomness to set the initial state. The random initial state will be derived from a hardware random number generator (if available) or from the current time otherwise. This initial state will be different every time you start up your programming environment.
Do I have to say it? Donât pick 9427385. Thereâs nothing special about it. I just pressed some number keys and this is what came out.
I have always preferred Adamsâ novels Dirk Gentlyâs Holistic Detective Agency and The Long Dark Tea-Time of the Soul over the Hitchhikerâs Guide series. But the Hitchhikerâs Guide series is good, in particular the first two books. The strange humor is present in all of Adamsâ writing, though.
This then leads to them building another, even bigger computer, which turns out to be all of Earth, and thatâs an important component of the storyline in the book. But thatâs not relevant to our discussion here, which is about random seeds.
Or not. Again, Douglas Adamsâ humor was a bit weird, and always trending towards rather silly.
Please let me know if you ever write a programming for biologists textbook; I learned more from this blog post than I have in multiple days of vibe coding.
Come on, it is well known that you have to change your random seed until your results become significant, so obviously it canât always be 42. /s