If your random seed is 42 I will come to your…

Oct 22

Figuratively. More likely you'll get a stern talking to.

9 Comments

Come on, it is well known that you have to change your random seed until your results become significant, so obviously it can’t always be 42. /s

Expand full comment

Reply (1)

Claus Wilke

Oct 22

Ah, I didn't consider that. Good point!

Expand full comment

Reply (1)

le raz

Nov 18

Honestly this aspect is why it can be good to see a 'standard' random seed (e.g., 0, 1, 42) as it seems less likely to be cherry-picked than 17582956172.

Expand full comment

Eurydice

Oct 22

Please let me know if you ever write a programming for biologists textbook; I learned more from this blog post than I have in multiple days of vibe coding.

Expand full comment

Reply (1)

Claus Wilke

Oct 22

I will probably never write a programming book. The problem with programming books is they are outdated the moment they go into print, as the field moves so fast. I deliberately didn't put any code into my dataviz book for that exact reason.

Expand full comment

Reply (1)

Eurydice

Oct 22

A very good point on the programming book, however I’m immediately hearing that there is a dataviz book

Expand full comment

Reply (1)

Claus Wilke

Oct 22

I'm pretty sure it was mentioned in the post. ("When I wrote my book on data visualization I used this technique quite frequently, for example in this chapter.")

Either way, here it is: https://clauswilke.com/dataviz/

There's also a class based on this book, and it has code examples and exercises: https://wilkelab.org/SDS366/

Expand full comment

Craig

Oct 22

This is excellent mathematics.

Thank you.

Expand full comment

Nathan Walker

Oct 25Edited

"Using a fixed random seed when splitting data into training and test sets is uniquely bad, as you’re always going to be sampling the same split when you’re re-training your classifier."

There are certainly use cases where changing your seed is beneficial, but the main one -- comparing runs on different experiments -- requires not completely randomizing where your train/test come from between runs!

Of course, if you want some more robust metrics / information, you can always choose different train/test splits and average your results...or even better, use cross-validation. But for 'comparing across single runs'....you should never change the seed # between runs...

Expand full comment

Genes, Minds, Machines

If your random seed is 42 I will come to your…