The lexer/parser is never the bottleneck. In fact, you can write those two by hand over a single weekend for a largish language. With LLMs, it takes 15 minutes if you have an unambiguous spec.
The biggest time sink, and the reason you will fail for sure, is the inability to restrict the scope of the project. You start with a limited feature set and produce the entire compiler/vm toolchain. Then you get greedy and fiddle with the type system, adding features that you have never used and probably never will. And now you have to change every single phase from start to end.
I mostly give up at this stage.
I wrote a compiler toolchain and debugger that takes a Turing machine description plus input string and emits an encoded tape runnable by a Universal Turing Machine [0]. I had some prior PL experience, but never did an end-to-end compiler pipeline, at least not this low level.
It started as a joke/experiment, but I couldn't believe how fast it pulled me into designing:
- a small low-level ASM for building the UTM
- an ABI for symbol widths and encoding grammar
- an interpreter used as the behavioral oracle
- raw TM transitions for each ASM instruction, generated by having an LLM iterate on candidate emissions and checked against the interpreter oracle
- a CFG-style IR to fix the LLM mess once direct ASM -> TM emission became too hard to keep sane (LLM did a decent job actually, I don't think I would have done a much better job without the IR either)
- a gdb-style debugger for raw transitions, ASM routines, and blocks
- a trace visualizer
- a bootstrapping experiment where an L1 UTM/input pair was itself run through an L2 UTM
- optimisation experiments
And every step came quite naturally and was easy to tie in with everything else. Each one was just the next local repair needed to make the previous layer tractable.
[0] Repo: https://github.com/ouatu-ro/mtm
The language, Sapphire, is Ruby inspired, so the most interesting part is digging into the internals of the latter when I'm trying to figure out how something should work.
The team implementing the survey system wound up using the same language to implement the runtime portion, something I never expected or designed in.
I don't recall anything about what it looked like now. I do remember it was a lot of fun to write.
(also, they might want to look into lua userdata, since that would address their concern about the overhead of converting between native and lua data structures. the language is designed to be embedded in C programs after all)
But my main point is that libriscv is one of the fastest libriscv emulators and then something like C/C++/lua could've been used with sandboxing purposes for the purposes of the game then.
Am I missing something? Although, making a programming language is one kind of its own projects and that's really cool as well :-D
but I would also love to hear the author's opinion on libriscv as it feels like it ticks of all the boxes from my understanding
Only thing that goes for C++ is that it has acceptable (not straightforward) C interop.
I don't like C# and X++ because the language surface is huge but if you use a limited subset than needles to say, very useful and handy languages too.
Maybe AI is good enough now to help me with that..
The last time I tried, Claude couldn't even help me build a syntax highlighter for a hypothetical language.
Roughly 100%.
The tail ends of a language implementation (parsing and code generation) are a fixed cost; the "middle end" can grow unbounded as more production-quality items are added.
My language: https://www.empirical-soft.com
Coming up with new ideas is hard. Especially since you have to test them in the real world.
I'm kind of curious and want to try it for fun as long as i get some free time ^^
Whenever I think I want to make a statically typed procedural language, I inevitably end up with Odin. Odin is just so damn comfy I can't see another language in the same domain beating it.
Functional languages, it's hard to top Haskell, since it basically has everything ever invented in it.
Replacing C++: always seems like a good idea until you get into stuff like macros, templates, name mangling and compiler specific nonsense. So you either just build something that compiles TO C++ (and compiles slow AF) or give up.
Scripting language: I just love Ruby too much. Mruby is great for scripting, Ruby is great for system tools/scripts, it's comfy and has everything you could ever want.
Lisps: sounds fun, I like Lisp-y languages, but SBCL is just a marvel of engineering that's impossible to not use. Attempts at another (OSS) native Lisp runtime always seem to end up waaaaaay behind in performance. So a DSL on top of SBCL Common Lisp? Maybe...
Recently I got into stuff like Terra lang (basically a Lua-DSL that compiles to C which you can metaprogram with Lua) and Coalton so maybe a meta-language is the way... Also came back to Common Lisp so maybe I'll do a Coalton-esque language (but with a different set of compromises, not as "functional") on top of CL... Lisp productivity is insane and SBCL performance is too if you poke and prod it just right...
However if you make your own programming language and want to post about it on HN, please add something new to the language that makes the language interesting.
I don't think we need yet another look-mum-I-made-a-compiler-for-a-language-that-is-similar-to-N-other-languages-but-with-a-slightly-different-syntax-and-zero-libraries post on HN.
Just saying.