Discussion about this post

User's avatar
Josh's avatar

Quick comment about notebook reproducibility: I am a very recent convert to Marimo notebooks, which encode dependencies between cells and avoid this problem. Also very useful for beautiful and interactive results that can be viewed as a slide deck -- a hybrid between a notebook and a dashboard. Highly recommend you check it out!

ScienceGrump's avatar

X = my_expensive_computation()

X.to_csv("data/X.csv")

X = pd.read_csv("data/X.csv")

Trivial to do after anything expensive. Comment the first two lines, go on your merry way. I don't even understand what the argument against this would be, but I think it could only make sense to someone with extremely limited experience.

Novices fail to understand is how critical speed is to analysis - really to any technical skill. You need to be able to look at the data again and again in many different ways. Therefore your workflow has to support frictionless data manipulation. This is why the people using AI for analysis are fools IMO. It is intrinsically calcifying to have a machine making your code. It will make many dumb decisions and never review them, and just the fact of having it write your code for you creates a barrier for you to review them. And bluntly, if it is faster than you, you are just too slow.

No posts

Ready for more?