9 Comments
User's avatar
Mario Pasquato's avatar

Come on, it is well known that you have to change your random seed until your results become significant, so obviously it can’t always be 42. /s

Expand full comment
Claus Wilke's avatar

Ah, I didn't consider that. Good point!

Expand full comment
le raz's avatar

Honestly this aspect is why it can be good to see a 'standard' random seed (e.g., 0, 1, 42) as it seems less likely to be cherry-picked than 17582956172.

Expand full comment
Eurydice's avatar

Please let me know if you ever write a programming for biologists textbook; I learned more from this blog post than I have in multiple days of vibe coding.

Expand full comment
Claus Wilke's avatar

I will probably never write a programming book. The problem with programming books is they are outdated the moment they go into print, as the field moves so fast. I deliberately didn't put any code into my dataviz book for that exact reason.

Expand full comment
Eurydice's avatar

A very good point on the programming book, however I’m immediately hearing that there is a dataviz book

Expand full comment
Claus Wilke's avatar

I'm pretty sure it was mentioned in the post. ("When I wrote my book on data visualization I used this technique quite frequently, for example in this chapter.")

Either way, here it is: https://clauswilke.com/dataviz/

There's also a class based on this book, and it has code examples and exercises: https://wilkelab.org/SDS366/

Expand full comment
Craig's avatar

This is excellent mathematics.

Thank you.

Expand full comment
Nathan Walker's avatar

"Using a fixed random seed when splitting data into training and test sets is uniquely bad, as you’re always going to be sampling the same split when you’re re-training your classifier."

There are certainly use cases where changing your seed is beneficial, but the main one -- comparing runs on different experiments -- requires not completely randomizing where your train/test come from between runs!

Of course, if you want some more robust metrics / information, you can always choose different train/test splits and average your results...or even better, use cross-validation. But for 'comparing across single runs'....you should never change the seed # between runs...

Expand full comment