Discussion about this post

User's avatar
Trevor Freeman's avatar

I think R+tidyverse is marginally nicer for basic data cleaning and plotting. Although for me Polars+Plotnine has made the gap pretty insignificant, and I have well over 5 years experience with R+tidyverse over Polars+Plotnine, so it's hard to directly compare.

The problem with R is that doing anything besides chaining together a bunch of tidyverse functions in a notebook absolutely sucks. Every time I go to write a little module that I want to be reusable and portable and tested and that handles errors I want to pull my hair out. Little things like implementing a command line interface for a script or two are so much nicer in Python that I think it's well worth giving up the marginal benefits of the tidyverse. Don't even get me started on basic utilities like logging, implementing custom classes, built-in support for modules and virtual environments, native assertions for zero effort sanity checks in the code, not having the entire developer experience revolve around RStudio/Positron, and so many more.

I don't want to come down too hard on R. I still use it almost daily, and as a language for statistical analysis I think it's completely unparalleled. But as someone who also has to frequently write "real software" to get some data science related parts of my job done my chest gets tight when I realize a core part of what I need to write has to be done in R.

Expand full comment
Daniel Morton's avatar

I switched from R to Python a decade ago and I've never really regretted it. R's memory issues are something a dealbreaker for me. The fact that it doesn't exist outside the stats/DS world is something of a problem as well.

That said, matplotlib is terrible. So much so that I'm not sure it's even using Python properly. There's really no reason Python can't have a data vis package more like ggplot. Except that matplotlib already exists and no one has the time to reinvent that particular wheel.

Sklearn has it's problems (I'm pretty sure the backend design violates several software architecture principles) but they don't affect daily life.

Pandas syntax is eccentric, and not in a good way. Numpy is okay, as is PyTorch, but they're their own ecosystems. so different rules apply.

And really I don't like Python that much. Something about it just feels cheap. It makes it far too easy to write bad, unstable code, and has spawned far too many bad coders. It's fine for mucking about, but for making something that lasts give me a good statically typed language any day.

Expand full comment
38 more comments...

No posts

Ready for more?