FRESH

Hacker News

Home

Optimizing a Math Expression Parser in Rust

107 points by serial_dev

by IshKebab

4 subcomments

Every time I see people use flamegraphs it's the ancient Perl version. There's a much better version!!!
Use the Go version of pprof: https://github.com/google/pprof
Run it like `pprof -http : your_profile.out` and it will open a browser with a really nice interactive flamegraph (way better than the Perl version), plus a call graph, source line profiling, top functions, etc. etc.
It's so much better. Don't use the Perl version. I should probably write a post showing how to do this.
Another also-much-better alternative is Samply (https://github.com/mstange/samply) which uses the Firefox Profiler as a GUI. I don't like it quite as much as pprof but it's clearly still much better than what's in this article:
https://share.firefox.dev/3j3PJoK

by zokier

3 subcomments

> We’re paying a cost for each split_whitespace call, which allocates intermediate slices.
This part seems bit confused, I don't think `split_whitespace` does any allocations. I wish there were few intermediary steps here, e.g. going from &str and split_whitespace to &[u8] and split.
The tokenizer at that point is bit clunky, it is not really comparable to split_whitespace. The new tokenizer doesn't actually have any whitespace handling, it just assumes that every token is followed by exactly one whitespace. That alone might explain some of the speedup.

by pornel

0 subcomment

I don't think memory mapping does anything to prevent false sharing. All threads still get the same data at the same address. You may get page alignment for the file, but the free-form data in the file still crosses page boundaries and cache lines.
Also you don't get contention when you don't write to the memory.
The speedup may be from just starting the work before the whole file is loaded, allowing the OS to prefetch the rest in parallel.
You probably would get the same result if you loaded the file in smaller chunks.

by noelwelsh

0 subcomment

I feel like I'm taking crazy pills. It's not a parser, but a fused parser AND interpreter. This changes the game considerably! It doesn't have to produce an intermediate AST, and therefore can avoid the majority of the allocation that most parsers will perform.
However, avoiding creating the AST is not very realistic for most uses. It's usually needed to perform optimizations, or even just for more complicated languages that have interesting control-flow.

by catfacts

4 subcomments

I am not even a newbye in Rust and also this could be just nitpicking, but it seems that match is comparing strings and not characters, if this is the case then I think Common Lisp can optimize more, since there is a special comparison for characters in CL.
Edited: In the optimized version the author use bytes and generators and avoid using strings. I don't know if Rust generators are optimized for speed or memory, ideally you could define the length of the buffer according to the memory cache available.
Edited: I find strange using input = read_input_file()? and then using eval(&input), what happens when there is an error reading the file? Rust is supposed to be a high security language. In CL there are keyword like if-does-not-exists to decide what to do and also read accepts additional parameters for end-of-file and for expressing that this read is inside a recursive procedure inside another read.
I should stop comparing Rust to CL, better learn Rust first. I consider this kind of articles a very good way of learning Rust for those interested in parsing and optimizations. Rust seems to be a very nice language when you can afford the time to develop your program.

by tialaramex

1 subcomments

This reminds me I should actually write a "natural" arithmetic expression parser for my Rust crate realistic
Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it will tell you that's 60, because yeah, it's sixty, that's how real arithmetic works.
But it would be nice to write "(40^0.5) * (90^0.5)" or similar and have that work instead or as well. The months of work on realistic meant I spent so long without a "natural" parser that I got used to this.

by makapuf

0 subcomment

I very much like those optimizations articles , what could be interesting is to look at benchmarks not only wall time but also other metrics :
- cpu time (better CPU usage can mean shorter wall time but higher CPU)
- memory usage
- but also and maybe more interestingly complexity of code (not an absolute metric, but a very complex/non portable code for 5% speedup may or may not be worth it)
EDIT: formatting

by taeric

0 subcomment

I'm somewhat curious on if these optimizations would all have roughly the same impact if done in other orders? The presentation certainly makes it look like creating a big list of tokens is always the culprit here. Seems somewhat expected, so I agree with the text; but I still wonder if the other optimizations are best to look at in terms of percentage gains or absolute gains, here.
Neat write up! Kudos on that.

by librasteve

0 subcomment

thought it would be fun to write this in raku

  grammar Arithmetic {
    rule TOP       { ^ <expr>       $ }
    rule expr      { <term>+ % ['+' | '-'] }
    rule term      { <value> }
    rule value     { <number> | <parens> }
    rule number    { \d+ }
    rule parens    { '(' <expr> ')' }
  }

by conaclos

1 subcomments

I am a bit surprised that the author didn't try to implement a stream parser. This could avoid loading the entire file in memory or relying on OS features like memory-mapped files.

by combinator_y

0 subcomment

I am wondering if there is a different approach that 'peaks' better in terms of perf, like instead of doing : - Optimization 1: Do not allocate a Vector when tokenizing - Optimization 2: Zero allocations — parse directly from the input bytes - Optimization 3: Do not use Peekable - Optimization 4: Multithreading and SIMD - Optimization 5: Memory‑mapped I/O
Example : - Optimization 1: Memory‑mapped I/O - Optimization 2: Do not use Peekable - Optimization 3: Do not allocate a Vector when tokenizing - Optimization 4: Zero allocations — parse directly from the input bytes Conclusion - Optimization 5: Multithreading and SIMD
I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading. (this is a pragmatic way, in real life you will try to minimize risk and try to reach goal as fast as possible, simd/Multithreading carry a lot of risk for your average dev team)

by ramon156

0 subcomment

This would be a perfect class project. First lesson is letting them go loose, second lessen would be to find out which optimizations they used and which are available

by amelius

7 subcomments

Can somebody explain this line:
```
    n => Token::Operand(n.parse().unwrap()),
```
How does the compiler derive the type of n?

by nurettin

0 subcomment

I have observed that "fearless concurrency" didn't really do much in this case compared to basic practices like not allocating in tight loops.

by ankitlakra

0 subcomment

[dead]