FRESH

Hacker News

Home

Performance hacks for faster Python code

116 points by ashvardanian

by qsort

2 subcomments

I'm sure this is plenty useful for less experienced people, but the "smart" hacks read a bit like:
Hack 1: Don't Use The Obviously Wrong Data Structure For Your Problem!
Hack 2: Don't Have The Computer Do Useless Stuff!
Hack 3: Don't Allocate Memory When You Don't Need To!
And now, a word from our sponsor: AI! Use AI to help AI build AI with AI, now with 15% more AI! Only with AI! Ask your doctor if AI is right for you!
It's worth pointing out that a few of them are Python-specific. Compilers can inline code, there's usually no need to manually inline functions in most languages, that's Python being Python. Which scope the function is from being important is quintessentially Python being Python.
The major gains in Python come from... not using Python. Essentially you have to rewrite your code around the fact that numpy and pandas are the ones really doing the work behind the curtain (e.g. aggressively vectorize, use algorithms that can use vectorization well rather than "normal" ones). Number 8 of the list hints at that.

by pansa2

1 subcomments

> Copying large objects like lists […] can be costly in both time and memory.
> modify[ing] objects in place […] improves performance by avoiding the overhead of allocating and populating new structures.
AFAIK the poor performance of list copies (demonstrated in the article by a million-element list taking 10ms) doesn’t come from memory allocation nor from copying the contents of the list itself (in this case, a million pointers).
Rather it comes from the need to chase all of those pointers, accessing a million disparate memory locations, in order to increment each element’s reference count.

by woodruffw

2 subcomments

(2) surprised me a little. Not because of the performance consequences, but because I almost never see explicit calls to `copy()` in Python (and I read a lot of Python).
I think maybe a more realistic example there would be people using splatting without realizing/internalizing that it performs a full copy, e.g.
```
    xs = [1, *ys] 
```
Another one that stood out was (3). Slots are great, but >95% of the time I'd expect people would want to use `slots=True` with dataclasses instead of manually writing `__slots__` and a constructor like that. `slots=True` has worked since Python 3.10, so every non-EOL version of Python supports it.

by driggs

0 subcomment

Funny that Hack #1 compares list versus set value lookup, but the timer doesn't include the time to copy the list into a set. Hack #2 warns against unnecessary copying, and the time for copying the list is almost the same as the performance gain in Hack #1.
In fact, creating a set takes longer than copying a list since it requires hash insertion, so it's actually much faster to do the opposite of what they suggest for #1 (in the case of a single lookup, for this test case).
Here's the results with `big_set = set(big_list)` inside the timing block for the set case:
```
    List lookup: 0.013985s
    Set lookup:  0.052468s
```

by zahlman

0 subcomment

A previous submission of this (https://news.ycombinator.com/item?id=45937910) didn't take off; I'm posting this to cite my comment there (https://news.ycombinator.com/item?id=45940360) rather than copying and pasting it.

by tyingq

3 subcomments

Maybe also knowing when not to use python, or finding a solution in python that uses C/rust/etc underneath.

by persedes

1 subcomments

Some helpful guidelines, but it's 2025 and people still use time.time and no stats with their benchmarks :(
In general I feel like these kind of benchmarks might change for each python version, so some caveats might apply.

by lunias

1 subcomments

What about using PyPy? You'll probably see a significant improvement in these benchmarks. You should also give it a shot in Node which I expect to be about on par with PyPy, but without the GIL.

by automatoney

2 subcomments

There's some genuinely interesting tips in here, but #10 is for sure just padding so they could call the article "10 Hacks" haha. Everything else is at least somewhat Python specific, but "Hack 10: Avoid repeated function calls in loops" is just applicable to anything.

by ecshafer

0 subcomment

1,2,5 are just kind of standard computer science knowledge things I would expect any CS2 student to know, and are pretty universal across most languages.
Some of these are pretty nice python tricks though.

by yegle

0 subcomment

FWIW: the `x in foo` test calls the `foo.__contains__` magic method.
For a list, the only way to implement it is by iterate through it (see the `list_contains` function in the CPython code).
But for the special `range` object, it can implement the `__contains__` efficiently by looking at the start/stop/step (see the `range_contains` source code).
Although the Hack 1 is for demonstration purpose, in most cases you can just do `999999 in range(1000000)`.
In my test, the same `999999 in foo` is 59.1ns for the range object, 27.7ns for the set, 6.7ms for the list. The set is the fastest, except converting the range object to the set takes 21ms.

by utbabya

0 subcomment

Most of the time I find the pros of not mutating variables out weight any potential memory / performance gain, of course it depends on what you're doing, but I find it rare other than perhaps scientific related code.

by mont_tag

0 subcomment

All good suggestions. However, the last one should have also mentioned functools.cache and functools.lru_cache.

by cluckindan

0 subcomment

6 runs two division operations in the ”unoptimized” version and one in the ”optimized” version. Gee, I wonder why it’s exactly twice faster.

by bjoli

2 subcomments

Isn't this pretty obvious if you spent more than three weeks writing python? There are lots tricks, but these listed here seem relevant to all languages. There is an almost an incountable amount of ways to write bad and slow python code that are not as obvious.
I remember the day I realized how much I dislike python. I have never had it click for me, despite writing it in and off since the python 2.0. There are always some new arbitrary places for it to bite you, and it always feels a little yucky. And then one day I saw something like this:
```
    # b is not defined here
    if blahblah():
        b = gronk()
    # might raise an exception
    do_stuff(b)
```
That tingles in all the wrong places. Ruby has other issues, but the core is still feels elegant. I still prefer scheme, though.
Edit: and maybe someone can explain to me why anyone would make it so that the simplest way to iterate through a collection is not the fastest? This is the case for most languages, but it still feels dumb. Just allow a slightly more obtuse syntax like
```
    for a in b<list>
```
and make that case just as fast as doing it with a while loop. The iterator protocol is imho for when we frankly don't know/care what we are iterating over, or when we elwant to be generic, or when the data structure can't expose the most efficient way of doing it (like a tree). There is no reason why for a in b: should be slower than while when b is a list or string.

by LigmaBaulls

1 subcomments

a smart hack for performance is don't use python

by adammarples

0 subcomment

I remember learning not to use dot access in performance critical loops

by nodesocket

0 subcomment

While obvious, a huge performance improvement bump can had using cachetools on functions. Cachetools is much more feature rich than lru_cache with support for TTL.