FRESH

Hacker News

Python is not a great language for data science

346 points by speckx

by aorist

8 subcomments

> Examples include converting boxplots into violins or vice versa, turning a line plot into a heatmap, plotting a density estimate instead of a histogram, performing a computation on ranked data values instead of raw data values, and so on.
Most of this is not about Python, it’s about matplotlib. If you want the admittedly very thoughtful design of ggplot in Python, use plotnine
> I would consider the R code to be slightly easier to read (notice how many quotes and brackets the Python code needs)
This isn’t about Python, it’s about the tidyverse. The reason you can use this simpler syntax in R is because it’s non-standard-evaluation allows packages to extend the syntax in a way Python does not expose: http://adv-r.had.co.nz/Computing-on-the-language.html

by RobinL

31 subcomments

I think a lot of this comes down to the question: Why aren't tables first class citizens in programming languages?
If you step back, it's kind of weird that there's no mainstream programming language that has tables as first class citizens. Instead, we're stuck learning multiple APIs (polars, pandas) which are effectively programming languages for tables.
R is perhaps the closest, because it has data.frame as a 'first class citizen', but most people don't seem to use it, and use e.g. tibbles from dplyr instead.
The root cause seems to be that we still haven't figured out the best language to use to manipulate tabular data yet (i.e. the way of expressing this). It feels like there's been some convergence on some common ideas. Polars is kindof similar to dplyr. But no standard, except perhaps SQL.
FWIW, I agree that Python is not great, but I think it's also true R is not great. I don't agree with the specific comparisons in the piece.

by jakobnissen

3 subcomments

Excellent article - except that the author probably should have gated their substantiation of the claim behind a cliffhanger, as other commenters have mentioned.
The author's priorities are sensible, and indeed with that set of priorities, it makes sense to end up near R. However, they're not universal among data scientists. I've been a data scientist for eight years, and have found that this kind of plotting and dataframe wrangling is only part of the work. I find there is usually also some file juggling, parsing, and what the author calls "logistics". And R is terrible at logistics. It's also bad at writing maintainable software.
If you care more about logistics and maintenance, your conclusion is pushed towards Python - which still does okay in the dataframes department. If you're ALSO frequently concerned about speed, you're pushed towards Julia.
None of these are wrong priorities. I wish Julia was better at being R, but it isn't, and it's very hard to be both R and useful for general programming.
Edit: Oh, and I should mention: I also teach and supervise students, and I KEEP seeing students use pandas to solve non-table problems, like trying to represent a graph as a dataframe. Apparently some people are heavily drawn to use dataframes for everything - if you're one of those people, reevaluate your tools, but also, R is probably for you.

by rossdavidh

1 subcomments

Speaking as a python programmer who has occasionally done work in R: yes, of course. Python is not a great language for anything; it's a pretty good language for just about anything. That is, and always has been, its strength.
If you're doing data science all day, you should learn R, even if it's so weird at first (for somebody coming from a C-style language) that it seems way harder; R is made for the way statisticians work and think, not the way computer programmers work and think. If you're doing data science all day, you should start thinking and working like a statistician and working in R, and the fact that it seems to bend your mind is probably at least in part good, because a statistician needs to think differently than a programmer.
I work in python, though, almost all of the time.

by progval

3 subcomments

The pure Python code in the last example is more verbose than it needs to be.

    groups = {}
    for row in filtered:
        key = (row['species'], row['island'])
        if key not in groups:
            groups[key] = []
        groups[key].append(row['body_mass_g'])

can be rewritten as:

    groups = collections.defaultdict(list)
    for row in filtered:
        groups[(row['species'], row['island'])].append(row['body_mass_g'])

and

    variance = sum((x - mean) ** 2 for x in values) / (n - 1)
    std_dev = math.sqrt(variance)

as:

    std_dev = statistics.stddev(values)

by willvarfar

2 subcomments

My experience was that data science was doable but clunky and ugly with pandas. It got slightly better with polars. Only really slightly better. Then, for me at least, it jumped lightyears ahead with duckdb.
These days I run some big query on an OLAP database and download the results to parquet stored on the local disk of a cloud notebook VM and then mine it to bits with duckdb reading straight from these parquet files.
The notebooks end up with very clear SQL queries and results (most notebook servers support SQL cells with highlighting and completion etc), and small pockets of python cells for doing those corner case things that an imperative language makes easier.
So when I get to the bottom of the article where it shows the difference between Python and R, I'm screaming "wouldn't that look better in SQL?!" :)

by psunavy03

1 subcomments

I'm not sure what that last example is meant to be other than an anti-Python caricature. If you're implementing calculating things like standard deviations by hand, that's not real-world coding, that's the undergraduate harassment package which should end with a STEM bachelor's.
Of course there's a bunch of loops and things; you're exposing what has to happen in both R and Python under the hood of all those packages.

by whyenot

5 subcomments

What makes Python a great language for data science, is that so many people are familiar with it, and that it is an easy language to read. If you use a more obscure language like Clojure, Common Lisp, Julia, etc., many people will not be familiar with the language and unable to read or review your code. Peer review is fundamental to the scientific endeavor. If you only optimize on what is the best language for the task, there are clearly better languages than Python. If you optimize on what is best for science then I think it is hard not to argue that Python (and R) are the best choices. In science, just getting things done is not enough. Other people need to be able to read and understand what you are doing.
BTW AI is not helping and in fact is leading to a generation of scientists who know how to write prompts, but do not understand the code those prompts generate or have the ability to peer review it.

by roadside_picnic

0 subcomment

So I've been writing Python for around 20 years now, and doing data science/ML work for around 15. Despite being a Python programmer first I spent a good 5 years using R exclusively. There's a lot of things I genuinely love about R and I strongly believe that R is unfairly maligned by devs... but there's a good reason I have written exclusively Python for DS work for the last 5 years.
> Python is pretty good for deep learning. There’s a reason PyTorch is the industry standard. When I’m talking about data science here, I’m specifically excluding deep learning.
I've written very little deep learning code over my career, but made very frequent use of the GPU and differentiable programming for non-deep learning specific tasks. In general Python is much easier to write quantitative programs that make use of the hardware, and you have a lot more options when your problem doesn't fit into RAM.
> I have been running a research lab in computational biology for over two decades.
I've been working nearly exclusively in industry for these two decades and a major reason I find Python just better is it's much, much easier to interface with other parts of engineering when you're a using truly general purpose PL. I've actually never worked for a pure Python shop, but it's generally much easier to get production ML/DS solutions into prod when working with Python.
> Data science as I define it here involves a lot of interactive exploration of data and quick one-off analyses or experiments
This re-iterates the previous difference. In my experience I would call this "step one" in all my DS related work. The first step is to understand the problem and de-risk. But the vast majority of code and work is related to delivering a scalable product.
You can say that's not part of "data science", but if you did you'd have a hard time finding a job on most of the teams I've worked on.
All that said, my R vs Python experience has boiled down to: If your end result is a PDF report, R is superior. If your end result is shipping a product, then Python is superior. And my experience has been that, outside of university labs, there aren't a lot of jobs out there for DS folks who only want to deliver PDFs.

by sheepscreek

0 subcomment

I really didn’t understand the author’s grievances. The only concrete example they illustrated was one where they concluded that Python without Pandas is verbose and ugly to achieve the same outcome, hence Python is not great for Data Science.
That’s a bad argument or a naive and obvious one; depending on how you look at it.
Python wasn’t designed for Data Science. It is not a DSL for it. MATLAB was arguably designed for scientific computing, and yet it’s the most disliked language in the StackOverflow liked/disliked index.
Here’s a different way to look at it. A good programming language is like the weather in a city. I would love to live somewhere where it’s 72F/23C all year round. But if it’s in the middle of nowhere and I’ve got no friends to hang out with, would I? I don’t think so.
FWIW, Python is like Sweden or Finland, with shitty weather for 6 months of the year yet thriving against all odds.
PS: I think the article’s topic is a bit click-batey (not a particularly useful discussion) because it’s polarizing and no one will be 100% right about it. It’s perhaps best thought of as an opinion piece.

by jakubmazanec

1 subcomments

I wish people used Julia more. Few years ago I reimplemented some MATLAB code for a novel algorithm [1] I wanted to use in my dissertation about psychometrics and Julia was great language to work with - and also the code ran for 20 minutes instead of 60.
[1] https://link.springer.com/article/10.1007/s11336-017-9581-x

by markkitti

0 subcomment

I tried this in Julia with TidierData.jl, and it looks quite similar to the R version.

  using TidierData, DataFrames
  using PalmerPenguins: load

  penguins = load()

  @chain penguins begin
    DataFrame
    @drop_missing(body_mass_g)
    @group_by(species, island)
    @summarize(
      body_weight_mean =
        mean(body_mass_g),
      body_weight_std =
        std(body_mass_g)
    )
    show(_, allrows=true)
  end

by forgotpwd16

2 subcomments

Article is well written but fails to address its own thesis by postponing it to a sequel article. At its current state only alludes that Python is not great because requires specialized packages. (And counterexample is R for which also used a package.)

by pacbard

0 subcomment

When you think about a data science pipeline, you really have three separate steps:
[Data Preparation] --> [Data Analysis] --> [Result Preparation]
Neither Python or R does a good job at all of these.
The original article seems to focus on challenges in using Python for data preparation/processing, mostly pointing out challenges with Pandas and "raw" Python code for data processing.
This could be solved by switching to something like duckdb and SQL to process data.
As far as data analysis, both Python and R have their own niches, depending on field. Similarly, there are other specialized languages (e.g., SAS, Matlab) that are still used for domain-specific applications.
I personally find result preparation somewhat difficult in both Python and R. Stargazer is ok for exporting regression tables but it's not really that great. Graphing is probably better in R within the ggplot universe (I'm aware of the python port).

by lenerdenator

1 subcomments

> I think people way over-index Python as the language for data science. It has limitations that I think are quite noteworthy. There are many data-science tasks I’d much rather do in R than in Python.1 I believe the reason Python is so widely used in data science is a historical accident, plus it being sort-of Ok at most things, rather than an expression of its inherent suitability for data-science work.
Python doesn't need to be the best at any one thing; it just has to be serviceable for a lot of things. You can take someone who has expertise in a completely different domain in software (web dev, devops, sysadmin, etc.) and introduce them to the data science domain without making them learn an entirely new language and toolchain.

by orochimaaru

2 subcomments

It’s not. Julia is better, much better. But Julia came too late.
A lot of data science code is already in Python. That’s where it’s going to stay because rewriting code is time consuming. My guess is we will continue to improve Python gradually and keep refactoring the code.

by Surac

3 subcomments

I at the moment try to learn python as a hobby language. I use c c++ and c# to earn my money. MY biggest problem is finding good examples that are up to date. I spent a whole day learning that there a four (I think) ways to do formatting strings. This „bloat“ in syntax makes even a simple print very heavy to digest. I don’t even bother using v2 python only v3. Also using whitespaces to block things together sounds appealing but in reality you need to use editors that can indent and unindent whole blocks or I never get it right

by amai

0 subcomment

The example would better be written in SQL. So according to the author that would make SQL a great language for data science. SQL also supports tables natively. This conclusion is of course ridiculous and shows the shallow reasoning in this article.

by drnick1

0 subcomment

I suppose it depends on what exactly is meant by "data science." If find that for stochastic simulations, C++ and the Eigen library are unbeatable. You get the readability of high-level code with the performance of low-level code thanks to the "zero-cost abstractions" of Eigen.
If by data science you mean loading data to memory and running canned routines for regression, classification and other problems, then Python is great and mostly calls C/FORTRAN binaries under the hood, so Python itself has relatively little overhead.

by plaidfuji

3 subcomments

Python is a pretty bad language for tabular data analysis and plotting, which seems to be the actual topic of this post. R is certainly better, hell Tableau, Matlab, JMP, Prism and even Excel are all better in many cases. Pandas+seaborn has done a lot, but seaborn still has frustrating limits. And pandas is essentially a separate programming language.
If your data is already in a table, and you’re using Python, you’re doing it because you want to learn Python for your next job. Not because it’s the best tool for your current job. The one thing Python has on all those other options is $$$. You will be far more employable than if you stick to R.
And the reason for that is because Python is one of the best languages for data and ML engineering, which is about 80% of what a data science job actually entails.

by niemandhier

1 subcomments

Python is just a language that:
1. Is easy to read
2. Was easy to extend in languages that people who work with scientific data happen to like.
When I did my masters we hacked around in the numpy source and contributed here and there while doing astrophysics.
Stuff existed in Java and R, but we had learned C in the first semester and python was easier to read and contrary to MATLAB numpy did not need a license.
When data science came into the picture, the field was full of physicists that had done similar things. They brought their tools as did others.

by neuropacabra

0 subcomment

I expected the author will complain rightfully about the tooling, including linters, formatters and package managers. Things improved drastically over the years with Astral’s ruff, uv and alpha stage ty.
But the article says that very exotic syntax is more readable. I think this is mostly about the libraries, where honestly I equally don’t like matplotlib and R’s ggplot. But I would not think it’s language problem.
I was hoping to find some performance benchmarks or something more than feelings about certain block of code. Don’t get me wrong I am also not a die hard fan of Python although I have written a lot or production code in it. Mentioning bloated, boilerplate code…I am afraid author should look on Java or any modern JavaScript project.

by taeric

0 subcomment

I'm heavily inclined to agree with the general thought, but I balk at the low level code showing why a language is bad at something. In this specific case, without the tidyverse, R isn't exactly peaches and cream.
As annoying as it is to admit it, python is a great language for data science almost strictly because it has so many people doing data science with it. The popularity is, itself, a benefit.

by culebron21

1 subcomments

This was underwhelming. I work with Python and Pandas, and I can show examples of much clumsier workflows I run into. The most often, you get dataframe[(dataframe.column1 == something) & ~dataframe.column2.isna()] constucts, which show that python syntax falls short here, and isn't suitable for such manipulations. Unfortunately, there's no alternative, and I don't see R as much easier, there are plenty of ugly things as well there.
There's Julia -- it has serious drawbacks, like slow cold start if you launch a Julia script from the shell, which makes it unsuitable for CLI workflows.
Otherwise you have to switch to compiled languages, with their tradeoffs.

by keeeba

1 subcomments

As a fairly extensive user of both Python and R, I net out similarly.
If I want to wrangle, explore, or visualise data I’ll always reach for R.
If I want to build ML/DL models or work with LLM’s I will usually reach for Python.
Often in the same document - nowadays this is very easy with Quarto.

by Vaslo

0 subcomment

My team has all moved slowly from R to Python. There was no pressure to do so. R has a clunky feel with a bunch of modules that can be a challenge to automate. Python’s general purpose use beats whatever superior modules R has all day. If someone wants the same package on Python from R it’s probably out there.
While plotting may be clunky, I just don’t see r as much better. Plus in 2025 I can just provide a sample of data and what plot I want in an LLM and I get zero shot code of the plot I want.
Author sounds very academic to me.

by analog31

0 subcomment

>>> Without fail, from the students that use Python, the response is: “This will take me a bit. Let me sit down at my desk and figure it out and then I’ll be back.”
This is completely aside, but I wouldn't hold this against the students or Python. The students may be following an age-old rule of office politics: "Never troubleshoot in front of an audience." And why this is more prevalent among the students who use Python, well... sample size of 30.

by UniverseHacker

2 subcomments

Doing computational biology for several decades in about a dozen languages, I do think R is a much better language for data science, but in practice I end up using Python almost every time because it has more libraries, and it’s easier to find software engineers and collaborators to work on Python. However, R makes for much simpler cleaner code, less silent errors, and the 1 indexing makes dealing with biological sequences much less hassle.

by mushufasa

0 subcomment

Languages inherently have network effects; most people around the world learn English so they can talk with other professionals who also know English, not because they are passionate about Charles Dickens.
My take (and my own experience) is that python won because the rest of the team knows it. I prefer R but our web developers don't know it, and it's way better for me to write code that the rest of our team can review, extend, and maintain.

by iLemming

1 subcomments

From many practical points, Clojure is great for data. And you can even leverage python libs via clj-python.

by sarusso

0 subcomment

The main flaw of this article is comparing a general-purpose language built with production systems in mind (Python) with a domain-specific language designed for interactive analysis (R)... Beware of comparing apples and oranges, because productizing R code typically requires rewriting it in another language.

by janalsncm

0 subcomment

Python is versatile which is what makes it popular. You can load back and forth from a GPU using well-tested libraries. You can memmap things if you need to. If your loops are too slow you can rewrite the hot loops in rust or C. You can read and write from most file formats in a couple of lines.

by gyulai

1 subcomments

I think, the lesson learned from › Python v. R ‹ is that people prefer doing data science in a general purpose language that is also okay-ish for data science over a language that's purpose-built for data science but suffers from diseconomies. Specifically: Imagine a new database or something like that has just come out. Now, the audience that wants to wire it into applications and the audience that wants to tap it to extract data for analytics put their weight together to create the demand for the Python library. The economies for that work out better than if you had to create two different libraries in two different languages to satisfy those two groups of demand.

by actuallyalys

0 subcomment

As much as I like Python and personally prefer it to R, I don’t really disagree. But I’m not sure R is a great language for data science either—it has its own weaknesses, e.g., writing custom loops (or functional equivalents with map or reduce) was pretty clunky last I tried it.
The other thing is that a lot of R’s strengths are really the tidyverse’s. Some of that is to R’s credit as an extensible language that enables a skilled API designer to really shine of course, but I think there’s no reason Python the language couldn’t have similar libraries. In fact it has, in plotnine. (I haven’t tried Polars yet but it does at least seem to have a more consistent API.)

by dragonwriter

0 subcomment

The bare python/stdlib example used (as well as bare python and avoiding add-on data science oriented libraries not being the way most people would use python for data science) is just...bad? (And, by bad here I mean showing signs of deliberately avoiding stdlib features in order to increase the appearance of the things the author then complains about.)

A better stdlib-only version would be:

    from palmerpenguins import load_penguins
    import math
    from itertools import groupby
    from statistics import fmean, stdev

    penguins = load_penguins()

    # Convert DataFrame to list of dictionaries
    penguins_list = penguins.to_dict('records')

    # create key function for grouping/sorting by species/island
    def key_func(x):
        return x['species'], x['island']

    # Filter out rows where body_mass_g is missing and sort by species and island
    filtered = sorted((row for row in penguins_list if not math.isnan(row['body_mass_g'])), key=key_func)

    # Group by species and island
    groups = groupby(filtered, key=key_func)

    # Calculate mean and standard deviation for each group
    results = []
    for (species, island), group in groups:
        values = [row['body_mass_g'] for row in group]
        mean_value = fmean(values)
        sd_value = stdev(values, xbar=mean_value)
        results.append({
            'species': species,
            'island': island,
            'body_weight_mean': mean_value,
            'body_weight_sd': sd_value
        })

by Decabytes

0 subcomment

Python pays the bills. If it was up to me I'd use a different language, but there is no denying that its got a strong story in just about every field now. As I've gotten older, I've come to realize that programming languages are vehicles for solving computer based problems, and I've learned to find joy in solving those problems in whatever language my company/project is using.
But in my personal projects, my favorite language to use it Dart.

by spicybbq

0 subcomment

Part 2 is here:
https://blog.genesmindsmachines.com/p/python-is-not-a-great-...

by paulfharrison

1 subcomments

R is so good in part because of the efforts of people like Di Cook, Hadley Wickham, and Yihui Xie to create an software environment that they like working in.
It also helps that in R any function can completely change how its arguments are evaluated, allowing the tidyverse packages to do things like evaluate arguments in the context of a data frame or add a pipe operator as a new language feature. This is a very dangerous feature to put in the hands of statisticians, but it allows more syntactic innovation than is possible in Python.

by programmertote

0 subcomment

Disclaimer: I have nothing against R or Python and I'm not partial to either.
Python, the language itself, might not be a great language for data science. BUT the author can use Pandas or Polars or another data-science-related library/framework in Python to get the job done that s/he was trying to write in R. I could read both her R and Pandas code snippets and understand them equally.
This article reads just like, "Hey, I'm cooking everything by making all ingredients from scratch and see how difficult it is!".

by drtc

0 subcomment

I work as a data scientist where I do lots of exploratory work using Python+Pandas/Polars+Jupyter notebooks, I have to say that I agree with the title of the article.
I feel like I'm using Python more and more in a way that is just not native to it. Strict typing is simply necessary at some point in time, but I don't get any of the performance benefits or compile-time warnings that other languages provide. But more than Pandas/Polars/DuckDB, I think it is the plotting ecosystem that keeps me in the Python universe. Seaborn, altair, plotnine all take ggplot's GoG and bring it to Python and I'm really grateful for that.
I don't see an alternative to Python for someone who needs to: 1) Work with data that fits in memory and is mostly tabular (thus Pandas/Python), 2) needs to visualize this data often, 3) does exploratory work (Jupyter notebooks).

by serjester

0 subcomment

Seems like their critique boils down to two areas - pandas limitations and fewer built ins to lean on.
Personally I've found polars has solved most of the "ugly" problems that I had with pandas. It's way faster, has an ergonomic API, seamless pandas interop and amazing support for custom extensions. We have to keep in mind Pandas is almost 20 years old now.
I will agree that Shiny is an amazing package, but I would argue it's less important now that LLMs will write most of your code.

by NuSkooler

0 subcomment

You could end it with "Python is not a great language".
Now, is Python a SUCCESSFUL language? Very.

by Havoc

0 subcomment

Realistically it’s winning because it’s accessible rather than perfectly suited

by yeahwhatever10

1 subcomments

A little late for this

by rdtsc

0 subcomment

They basically advocate using R. I think it depends what they mean by "data science" and if the person will be doing just data science. If that's the case then R may be better. As in their whole career is going to built on that domain. But let's say they are on a general computer science track, now they'll probably benefit from learning Python more than R, simply because they can use it for other purposes.
> Either way, I’ll not discuss it further here. I’ll also not consider proprietary languages such as Matlab or Mathematica, or fairly obscure languages lacking a wide ecosystem of useful packages, such as Octave.
I feel, to most programming folks R is in the same category. R is to them what Octave is to the author. R is nice nice, but do they really want to learn a "niche" language, even if it has better some features than Python? Is holding a whole new paradigm, syntax, library ecosystem in your head worth it?

by ebonnafoux

0 subcomment

In the article
> Contrast this with equivalent code that is full of logistics, where I’m using only basic Python language features and no special data wrangling package:
```
   n = len(values)
   # Calculate mean
   mean = sum(values) / n
   # Calculate standard deviation
   variance = sum((x - mean) \* 2 for x in values) / (n - 1)
   std_dev = math.sqrt(variance)
```
He doesn' t know about the statistics package in the standart library of Python (https://docs.python.org/3/library/statistics.html). Of course, if you do not know to use Python, you will have a lot of boilerplate.

by huherto

0 subcomment

For what is worth. The Kotlin folks have been adding some cool features and tools for data analysis. https://kotlinlang.org/docs/data-analysis-overview.html

by nyrikki

0 subcomment

> Contrast this with equivalent code that is full of logistics, where I’m using only basic Python language features and no special data wrangling package
While I am not a python cheerleader, but a user because the reality is that it is a pretty good glue language, the above is a bit of a problem.
Duckdb, pandas, numpy etc.. is what makes python nice.
About a decade ago I worked at a major BI software company and ran into another silly problem when trying to evangelize R, wikis kbs and search engines don’t like single letter search terms.
So it didn’t matter how much better R was at the time, people found learning it more difficult than it should have been.

by Pinegulf

0 subcomment

Once the data is clean and neatly in standard format this becomes a matter of preference.
Work experience says that 90% of work is gathering, cleaning and transforming data from different sources. In this capacity Python has more options available.

by northlondoner

0 subcomment

Has anybody else noticed how much Python took from Scala for type hints? I was using Scala around 2015 and when I see type hints, immediately recognise its similarity to Scala's approach.

by LelouBil

0 subcomment

Kotlin is trying to be one with notebooks[0], I even heard they have fancy code generation so that your dynamic data can still have typed properties (after the first evaluations, members corresponding to your field names are generated, or something to that extent I never used it)
[0] https://kotlinlang.org/docs/kotlin-notebook-overview.html

by skeeter2020

0 subcomment

This is not about "Python is not a great language for data science" but the author's expertise and affection for R. I guess that title wouldn't get as many clicks.

by yodsanklai

0 subcomment

Maybe R is fine for people who use it all the time? but as SWE that occasionally needs to do some data analysis, I find it much easier to rely on tools I know rather than R. R is pretty convoluted as a language.

by egecant

0 subcomment

This article reminds me of another great article comparing R to Python: "Why pandas feels clunky when coming from R" (https://www.sumsar.net/blog/pandas-feels-clunky-when-coming-...)

by constantcrying

1 subcomments

Python is also an embarrassingly bad language for numerics. It comes without support for different floating point types does not have an n-D Array data type and is extremely slow.
At the same time it is an absolute necessity to know if you are doing numerics. What this shows, at least to me, is that it is "good enough" and that the million integrations, examples and pieces of documentation matter more than whether the peculiarities of the language work in favor of its given use case, as long as the shortcomings can be mostly addressed.

by kasperset

0 subcomment

R data science people generally come to data science field from life science or stats field. Python data science people generally originate from other fields that are mostly engineering focused. Again this may not apply to all the cases but that is my general observation.
Recently I am seeing that Python is heavily pushed for all data science related things. Sometimes objectively Python may not be the best option especially for stats. It is hard to change something after it becomes the "norm" regardless of its usability.

0 subcomment

by coolThingsFirst

1 subcomments

Python just has poor aesthetics. __init__(self) is unacceptable in a language in 2025. Ruby would've been a much better choice. Sloppiness in language design is just a bad idea.

by BiteCode_dev

0 subcomment

Notice how the article load_penguins() example starts neatly after all the messy parts of data science are done and stops right before the next pain starts.
It lives in a sterile, idealized world.
Python is a great language for data science in practice because it turns out data science is also:
```
   - gluing a lot of data sources

   - cleaning up a ton of terribly shaped data

   - validation and error handling

   - I/O, networking, and format conversion

   - emboarding non-programmers into programming

   - wrapping a lot of compiled languages' libs or plugging system

   - prototyping stuff and exposing that prototype to some people

   - turning prototypes into more permanent projects
```
And it turns out Python and its ecosystem are good at those while remaining decent at the other things.
There are other languages excellent at some of those, or some of the other things, but rarely good at most. And because humanity is vast, diverse, and constantly renewing, being the second best at those is eventually always winning.
Because whoever you are, you will be annoyed at not having the best experience at task X. But you would be mortified if you had the worst experience at doing task Y and Z. And task X, Y, and Z change depending on who you ask.
And you want to get things done, while days have 24 hours.
As usual, to understand the Python phenomenon, you have to see the whole picture. Not your little corner of the bubble. Not the ideal world in your head either. Life is not a maths problem with a clearly laid out premise and an elegant answer.
That's the same debate about why PHP won the web in 2000 no matter the size of the spaghetti plate, why Windows stayed used for so long despite it being terrible, why people keep using iphones after all the abuses, etc. There is more to it than the use case you have every day. People have needs you don't haven't thought about.
So it's not "let the language war begin". It's, "dude, get more experience, go work with accountants, ngos, govs and logistic chains, go work in china, africa and south america, go from a startup to schools to corporate, satisfy the geeks, the artists and the business people, than we'll talk".

by HelloNurse

0 subcomment

Guess what, doing a relatively complex but standard task (filtering and aggregating example penguins) with a specialized and ossified library (Pandas) is better than doing it "bare.handed" with basic lists and dicts.
More terse, more efficient, less error prone, hopefully more numerically accurate, as if Python had an ecosystem of well designed libraries on par with R.

by IshKebab

0 subcomment

Python's not a great language for anything. Maybe for teaching programming I guess (except then you end up with people that only know Python).

by solatic

1 subcomments

Shell is the best language for data science. Pick the best tools for each of getting data, cleaning data, transforming data, and visualizing data, then stitch them together by sheer virtue of the fact that text is the universal interoperable protocol and files are the universal way of saving intermediate stages of data.
Best part is, write a --help, and you can load them into LLMs as tools to help the LLMs figure it out for you.
Fight me.

0 subcomment

by aussieguy1234

0 subcomment

I felt forced to use python when I gave langgraph agents a go.
Worked quite well, but the TS/JS langgraph version is way behind. React agents are just a few lines of code, compared to 50 odd lines for the same thing in JS/TS.
Better to use a different language, even one i'm not familiar with, to be able to maintain a few lines of code vs 50 lines.

by northlondoner

0 subcomment

There is a similar tread, regarding life-time of projects, such as which ecosystem is better for long-term maintainability: https://news.ycombinator.com/item?id=46055463

by mfld

0 subcomment

This really calls for an A/B speed programming test of Python vs. R practitioners.

by huherto

1 subcomments

Isn't the author saying that Python + Pandas is almost as good as R, but Python without Pandas is less powerful than R.
I can't help to conclude that Python is as good as R because I still have the choice of using Pandas when I need it. What did I get wrong?

by drchaim

0 subcomment

Python was a great language for data science, when data science become a mainstream thing.
it was easy to think about the structures (iterators) it was easy to extend. it had a good community.
And for that, people start extending it via libraries.
There are plenty more alternatives now.

by fnord77

1 subcomments

Python is not a great language

by johnea

0 subcomment

They could have just left the last three words off of that title 8-/
Python is not a great language
First, the white space requirements are a bad flashback to 1970s fortran.
Second, it is the language that is least compatible with itself.

by drtournier

0 subcomment

JavaScript is not a great language for web development either, yet…

by poulpy123

0 subcomment

But python is a great language for data science. As the anglos say: the proof is in the pudding, and the fact it is massively used for data science prove it is great at data science.
You will say that not everything that is successful is great, and you will be right, but the success of python came organically, and not because of advertisement, de facto monopoly, politics, money, or first-arrived-advantage.
Although there is one cause that isn't intrinsic to python but from the people who built numpy. The fact there is a single numerical library, extremely easy to use, fast and extensive in the whole ecosystem was very very huge

by moi2388

0 subcomment

You had me at “Python is not a great language”

by exabrial

0 subcomment

The problem is there's so much momentum behind it that's hard to course correct. PyTorch is now a goliath.

by dcreater

1 subcomments

Fixed title: Python is not a great language for data science if pandas/polars/ibis did not exist

by knorke

0 subcomment

okay, click bait worked on me. but the claims are weak. basically "Python is not a great language... because it's not that of a domain language than R"
mediocre!

by slowhadoken

0 subcomment

Sounds like a skill issue

by another_twist

0 subcomment

I think TypeScript will shine here. Especially for data output pipelines so we can emit strongly typed datasets.
Also add to the fact that TS based exploratory code can potentially plot SVG via d3 and maybe even exported to a webpage.

by morshu9001

0 subcomment

Data science is the one thing I consider Python especially good at

by rob_c

0 subcomment

Refuses to learn tool so tool is broken... There is no problem with python for this. If you hate boiler plate job the club, get llms to generate it for you and move on to doing real work (or get involved in improving the language or libraries directly)

by fithisux

0 subcomment

Personally I use R for the occasional script or some tidyverse quick processing.
But the language has many rough edges
1. non standard eval is very weird, rlang fixes these shortcomings 2. unintuitive names or functions not belonging to packages, base has a mix of functions 3. S3 mixes with naming, no problem personally with S3 and S7 is even better, but mixing S3 names with ordinary names is unintuitive, keep snake case 4. data.frames are unintuitive, tidyverse fixes this 5. f(a=) seriously? or working with unintuitive functions in body for discrete ranges of function arguments? 6. no imports per file in packages, I can live with this .. still ... 7. AST functions are unintuitive
R has some excellent parts:
non-standard evaluation, AST in the base language, lazy evaluation
but it is being killed by the bad parts
I think all the external fixes and sanity in names should go into base
but it will take a lot of time if it ever happens due to legacy.
Julia fixes many of these not as elegantly as R but it's pragmatic approach is too attractive.

by CephalopodMD

1 subcomments

Python is the 2nd best language for almost everything

by prepend

0 subcomment

Doesn’t need to be great, just needs to be good enough.

by codeptualize

0 subcomment

Wait, so there is one example, which shows the R and Python equivalents are pretty much the same..
I was all hyped up, ready to see the amazing examples and arguments that would convince me to pick up R, and it gave me absolutely nothing (except quotes and brackets..).
Disappointing.

by _ZeD_

0 subcomment

Sooo... Is this a post about python envy?

by hmokiguess

0 subcomment

My issue with Python is that it makes it too easy to do things wrong, it accepts all and anyone. It’s too inclusive and permissive, which is great for expression and creativity but bad for exact sciences and rigid disciplines. In certain matters opinions and cargo cult programming are often a detriment for science. Unfortunately for high level abstractions it’s not that simple to do it right without sacrificing speed, so the industry forces the hand of the community in a lot of ways.

by thom

0 subcomment

I think this expectation that data science code is a thing you write basically top to bottom to get some answers out, put them in a graph and move on with your life is not a useful lens through which to evaluate two programming languages. R definitely is an efficient DSL for doing stats this way, but it’s a painful way to build a durable piece of software. Python is nowhere near perfect but I’ve seen fewer codebases that made my eyes bleed, however pretty the graphs might look.

by Lyngbakr

3 subcomments

I was a bit disappointed to discover that this was essentially an R vs. Python article, which is a data science trope. I've been in the field for 20+ years now and while I used to be firmly on team R, I now think that we don't really have a good language for data science. I had high hopes for Julia and even Clojure's data landscape looks interesting, but given the momentum of Python I don't see how it could be usurped at this point.

by hekkle

0 subcomment

For those who thought the article was TL;DR, the author argues.
- A General programming language like Python is good enough for data science but isn't specifically designed for it.
- A language that is specifically designed for Data Science like R is better at Data Science.
Who would have thought?

by jswelker

0 subcomment

Inherited Python code is a mixed bag. Inherited R code is a nightmare.

by semiinfinitely

0 subcomment

correct, its only the best on that we have

by KaiserPro

0 subcomment

The observation I make here is in that first python example with the penguins, what the fuck is that?
It makes it look like perl, on a bad day, or worse autogenerated javascript.
Why on earth is it so many levels deep in objects?

by shevy-java

1 subcomments

> I think people way over-index Python as the language for data science. It has limitations that I think are quite noteworthy. There are many data-science tasks I’d much rather do in R than in Python.
R is kind of a super-specialized language. Python is much more general purpose.
R failed to evolve, let's be honest. Python won via jupyter - I see this used ALL the time in universities. R is used too, but mostly for statistics related courses only, give or take.
Perhaps R is better for its niche, but Python has more momentum and in thus, dominates over R. That's simply the reality of the situation. It is like the bulldozer moving forward, at a fast speed.
> I say “This is great, but could you quickly plot the data in this other way?”
Ok so ... he would have to adjust R code too, right? And finding good info on that is simply harder. He says he has experience with universities. Well, I do too, and my experience is that people are WAY better with python than with R. You simply see that more students will drop out from R than from python. That's also simply the reality of the situation.
> They appear to be sufficiently cumbersome or confusing that requests that I think should be trivial frequently are not.
I am sure the reverse also applies. Pick some python library, do something awesome, then tell the R students to do the same. I bet he will have the same problems.
> So many times, I felt that things that would be just a few lines of simple R code turned out to be quite a bit longer and fairly convoluted.
Ok, so here he is trolling. Flat out - I said it.
I wrote a LOT of python and quite a bit of R. There is no way in life that the R code is more succinct than the python code for about 90% of the use cases out there. Sorry, that's simply not the case. R is more verbose.
> Here is the relevant code in R, using the tidyverse approach:
```
    penguins |>
      filter(!is.na(body_mass_g)) |>
      group_by(species, island) |>
      summarize(
```
This is like perl. They also don't adapt. R is going to lose grounds.
This professor just hasn't realised that he is slowly becoming a fossil himself, by being unable to see that x is better than y.

by ineedasername

0 subcomment

TLDR: thinks R is better for DS
Of course, if your DS is mixed with ML & modern AI you just:
pip install rpy2
But then, why choose? No need to be dogmatic, if R is nice for you:
install.packages("reticulate")

by nicechianti

0 subcomment

[dead]

0 subcomment