If an LLM can do this with English and arbitrary test cases why wouldn't you pick a programming language and specific test cases? This would give you significantly more repeatability and consistency all while probably having less overall lines of code.
Ah, so you're the person responsible for this brain rot. Could you please DM me your name and address? I think the 20-to-life I'll get will be well worth it.
I've now got a JavaScript interpreter and a WebAssembly runtime written in Python, built by Claude Code for web run from my phone.
I had no doubts claude/codex/gemini could handle this, but I was surprised that even an open-source model (glm4.7) did it 1 shot (just c/p the prompt from Install.md and came back 10 min later):
============================= 123 passed in 0.18s ==============================
Test Results:
All 123 tests from tests.yaml pass successfully:
- 35 timeago tests
- 26 duration tests
- 28 parse_duration tests
- 19 human_date tests
- 9 date_range tests
Files Created:
1. /workspaces/glmcode/whenwords/src/__init__.py - Library implementation
2. /workspaces/glmcode/whenwords/test_whenwords.py - Test file generated from tests.yaml
3. /workspaces/glmcode/whenwords/usage.md - Usage documentation
Crunched for 11m 38s
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Showing detailed transcript · ctrl+o to toggleI mean, it's a "toy" library, but the concept is so cool! And the fact that an open, locally hostable model can do it 1shot is insane.