Use the Go version of pprof: https://github.com/google/pprof
Run it like `pprof -http : your_profile.out` and it will open a browser with a really nice interactive flamegraph (way better than the Perl version), plus a call graph, source line profiling, top functions, etc. etc.
It's so much better. Don't use the Perl version. I should probably write a post showing how to do this.
Another also-much-better alternative is Samply (https://github.com/mstange/samply) which uses the Firefox Profiler as a GUI. I don't like it quite as much as pprof but it's clearly still much better than what's in this article:
This part seems bit confused, I don't think `split_whitespace` does any allocations. I wish there were few intermediary steps here, e.g. going from &str and split_whitespace to &[u8] and split.
The tokenizer at that point is bit clunky, it is not really comparable to split_whitespace. The new tokenizer doesn't actually have any whitespace handling, it just assumes that every token is followed by exactly one whitespace. That alone might explain some of the speedup.
Also you don't get contention when you don't write to the memory.
The speedup may be from just starting the work before the whole file is loaded, allowing the OS to prefetch the rest in parallel.
You probably would get the same result if you loaded the file in smaller chunks.
However, avoiding creating the AST is not very realistic for most uses. It's usually needed to perform optimizations, or even just for more complicated languages that have interesting control-flow.
Edited: In the optimized version the author use bytes and generators and avoid using strings. I don't know if Rust generators are optimized for speed or memory, ideally you could define the length of the buffer according to the memory cache available.
Edited: I find strange using input = read_input_file()? and then using eval(&input), what happens when there is an error reading the file? Rust is supposed to be a high security language. In CL there are keyword like if-does-not-exists to decide what to do and also read accepts additional parameters for end-of-file and for expressing that this read is inside a recursive procedure inside another read.
I should stop comparing Rust to CL, better learn Rust first. I consider this kind of articles a very good way of learning Rust for those interested in parsing and optimizations. Rust seems to be a very nice language when you can afford the time to develop your program.
Right now, realistic can parse "(* (^ 40 0.5) (^ 90 0.5))" and it will tell you that's 60, because yeah, it's sixty, that's how real arithmetic works.
But it would be nice to write "(40^0.5) * (90^0.5)" or similar and have that work instead or as well. The months of work on realistic meant I spent so long without a "natural" parser that I got used to this.
- cpu time (better CPU usage can mean shorter wall time but higher CPU)
- memory usage
- but also and maybe more interestingly complexity of code (not an absolute metric, but a very complex/non portable code for 5% speedup may or may not be worth it)
EDIT: formatting
Neat write up! Kudos on that.
grammar Arithmetic {
rule TOP { ^ <expr> $ }
rule expr { <term>+ % ['+' | '-'] }
rule term { <value> }
rule value { <number> | <parens> }
rule number { \d+ }
rule parens { '(' <expr> ')' }
}
Example : - Optimization 1: Memory‑mapped I/O - Optimization 2: Do not use Peekable - Optimization 3: Do not allocate a Vector when tokenizing - Optimization 4: Zero allocations — parse directly from the input bytes Conclusion - Optimization 5: Multithreading and SIMD
I might be guessing, but in this order probably by Optimization 3 you would reach already a high throughput that you wouldn't bother with manual simd nor Multithreading. (this is a pragmatic way, in real life you will try to minimize risk and try to reach goal as fast as possible, simd/Multithreading carry a lot of risk for your average dev team)
n => Token::Operand(n.parse().unwrap()),
How does the compiler derive the type of n?