Hack 1: Don't Use The Obviously Wrong Data Structure For Your Problem!
Hack 2: Don't Have The Computer Do Useless Stuff!
Hack 3: Don't Allocate Memory When You Don't Need To!
And now, a word from our sponsor: AI! Use AI to help AI build AI with AI, now with 15% more AI! Only with AI! Ask your doctor if AI is right for you!
It's worth pointing out that a few of them are Python-specific. Compilers can inline code, there's usually no need to manually inline functions in most languages, that's Python being Python. Which scope the function is from being important is quintessentially Python being Python.
The major gains in Python come from... not using Python. Essentially you have to rewrite your code around the fact that numpy and pandas are the ones really doing the work behind the curtain (e.g. aggressively vectorize, use algorithms that can use vectorization well rather than "normal" ones). Number 8 of the list hints at that.
> modify[ing] objects in place […] improves performance by avoiding the overhead of allocating and populating new structures.
AFAIK the poor performance of list copies (demonstrated in the article by a million-element list taking 10ms) doesn’t come from memory allocation nor from copying the contents of the list itself (in this case, a million pointers).
Rather it comes from the need to chase all of those pointers, accessing a million disparate memory locations, in order to increment each element’s reference count.
I think maybe a more realistic example there would be people using splatting without realizing/internalizing that it performs a full copy, e.g.
xs = [1, *ys]
Another one that stood out was (3). Slots are great, but >95% of the time I'd expect people would want to use `slots=True` with dataclasses instead of manually writing `__slots__` and a constructor like that. `slots=True` has worked since Python 3.10, so every non-EOL version of Python supports it.In fact, creating a set takes longer than copying a list since it requires hash insertion, so it's actually much faster to do the opposite of what they suggest for #1 (in the case of a single lookup, for this test case).
Here's the results with `big_set = set(big_list)` inside the timing block for the set case:
List lookup: 0.013985s
Set lookup: 0.052468sIn general I feel like these kind of benchmarks might change for each python version, so some caveats might apply.
Some of these are pretty nice python tricks though.
For a list, the only way to implement it is by iterate through it (see the `list_contains` function in the CPython code).
But for the special `range` object, it can implement the `__contains__` efficiently by looking at the start/stop/step (see the `range_contains` source code).
Although the Hack 1 is for demonstration purpose, in most cases you can just do `999999 in range(1000000)`.
In my test, the same `999999 in foo` is 59.1ns for the range object, 27.7ns for the set, 6.7ms for the list. The set is the fastest, except converting the range object to the set takes 21ms.
I remember the day I realized how much I dislike python. I have never had it click for me, despite writing it in and off since the python 2.0. There are always some new arbitrary places for it to bite you, and it always feels a little yucky. And then one day I saw something like this:
# b is not defined here
if blahblah():
b = gronk()
# might raise an exception
do_stuff(b)
That tingles in all the wrong places. Ruby has other issues, but the core is still feels elegant. I still prefer scheme, though.Edit: and maybe someone can explain to me why anyone would make it so that the simplest way to iterate through a collection is not the fastest? This is the case for most languages, but it still feels dumb. Just allow a slightly more obtuse syntax like
for a in b<list>
and make that case just as fast as doing it with a while loop. The iterator protocol is imho for when we frankly don't know/care what we are iterating over, or when we elwant to be generic, or when the data structure can't expose the most efficient way of doing it (like a tree). There is no reason why for a in b: should be slower than while when b is a list or string.