by lbreakjai
12 subcomments
- I consider myself rather smart and good at what I do. It's nice to have a look at problems like these once in a while, to remind myself of how little I know, and how much closer I am to the average than to the top.
by pvalue005
2 subcomments
- I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)
by languid-photic
6 subcomments
- Naively tested a set of agents on this task.
Each ran the same spec headlessly in their native harness (one shot).
Results:
Agent Cycles Time
─────────────────────────────────────────────
gpt-5-2 2,124 16m
claude-opus-4-5-20251101 4,973 1h 2m
gpt-5-1-codex-max-xhigh 5,402 34m
gpt-5-codex 5,486 7m
gpt-5-1-codex 12,453 8m
gpt-5-2-codex 12,905 6m
gpt-5-1-codex-mini 17,480 7m
claude-sonnet-4-5-20250929 21,054 10m
claude-haiku-4-5-20251001 147,734 9m
gemini-3-pro-preview 147,734 3m
gpt-5-2-codex-xhigh 147,734 25m
gpt-5-2-xhigh 147,734 34m
Clearly none beat Anthropic's target, but gpt-5-2 did slightly better in much less time than "Claude Opus 4 after many hours in the test-time compute harness".
by game_the0ry
2 subcomments
- > If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at performance-recruiting@anthropic.com with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.
This is an interesting way to recruit. Much better than standard 2 leetcode medium/hard questions in 45 mins.
- This is a really fun problem! I suggest anyone who likes optimization in a very broad sense to try their hand at it. Might be the most fun I've had while interviewing. I had to spend a week-worth of evenings on it to fully scratch the itch, and I managed to get 1112 cycles. But that was mostly manual, before the current crop of agentic models (clopus 4.5, gpt5.2). I wonder how far you can RalphWiggum it!
- It's pretty interesting how close this assignment looks to demoscene [1] golf [2].
[1] https://en.wikipedia.org/wiki/Demoscene
[2] https://en.wikipedia.org/wiki/Code_golf
It even uses Chrome tracing tools for profiling, which is pretty cool: https://github.com/anthropics/original_performance_takehome/...
by sureglymop
1 subcomments
- Having recently learned more about SIMD, PTX and optimization techniques, this is a nice little challenge to learn even more.
As a take home assignment though I would have failed as I would have probably taken 2 hours to just sketch out ideas and more on my tablet while reading the code before even changing it.
- [flagged]
- I just withdrew my application over this test. It forces an engineering anti-pattern: requiring runtime calculation for static data (effectively banning O(1) pre-computation).
When I pointed out this contradiction via email, they ignored me completely and instead silently patched the README to retroactively enforce the rule.
It’s not just a bad test; it’s a massive red flag for their engineering culture. They wasted candidates' time on a "guess the hidden artificial constraint" game rather than evaluating real optimization skills.
by SinghCoder
0 subcomment
- why is their github handle anthropics and not anthropic :D
by bytesandbits
1 subcomments
- Having done a bunch of take home for big (and small) AI labs during interviews, this is the 2nd most interesting one I have seen so far.
- What is the actual assignment here?
The README only gives numbers without any information on what you’re supposed to do or how you are rated.
by amirhirsch
1 subcomments
- I'm at 1137 with one hour with opus now...
Pipelined vectorized hash, speculation, static code for each stage, epilogues and prologues for each stage-to-stage...
I think I'm going to get sub 900 since i just realized i can in-parallel compute whether stage 5 of the hash is odd just by looking at bits 16 and 0 of stage 4 with less delay.....
- I liked the core challenge. Finding the balance of ALU and VALU, but I think that the problem with the load bandwidth could lead to problems
Like optimizing for people who assume the start indices always will be zero. I am close to 100% sure that's required to get below 2096 total loads but it's just not fun
If it however had some kind of dynamic vector lane rotate that could have been way more interesting
- This is a kind of task that's best solved by possibly spending more than the allocated 2 hours on it, once any obvious low-hanging fruit is picked. An optimization task is what a machine does best. So the real problem would be to construct a machine that would be able to run the optimization. A right optimization framework that results from the effort could also efficiently solve many more similar problems in the future.
I understand that this test is intended to somehow test the raw brianpower, the ability to tackle an unfamiliar and complicated domain, and to work under stress. But I hope it's not representative of the actual working conditions at Anthropic. It's like asking a candidate to play a Quake deathmatch when hiring to a special forces assault squad.
by FriendlyMike
2 subcomments
- They should just have you create a problem that can't be solved by an llm in two hours. That's the real problem here
by NitpickLawyer
3 subcomments
- The writing was on the wall for about half a year (publicly) now. The oAI 2nd place at the atcoder world championship competition was the first one, and I remember it being dismissed at the time. Sakana also got 1st place in another atcoder competition a few weeks ago. Google also released a blog a few months back on gemini 2.5 netting them 1% reduction in training time on real-world tasks by optimising kernels.
If the models get a good feedback loop + easy (cheap) verification, they get to bang their tokens against the wall until they find a better solution.
by seamossfet
0 subcomment
- I'm getting flashbacks from my computer engineering curriculum. Probably the first place I'd start is replacing comparison operators on the ALU with binary arithmetic since it's much faster than branch logic. Next would probably be changing the `step` function from brute iterators on the instructions to something closer to a Btree? Then maybe a sparse set for the memory management if we're going to do a lot of iterations over the flat memory like this.
- I got to 1364 cycles for now, semi-manually: Using design space exploration organized via backlog.md project, and then recombination from that. 20 agents in parallel.
Asked to generate drawio for the winner so I can grok it more easily, then I gave feedback.
Edit: 1121 cycles
by pickpocket
0 subcomment
- I cleared this assignment but did not clear the follow up interview that was way easier than this. So I gave up on tech interviews in general, stayed where I was.
by throwaway0123_5
0 subcomment
- > Claude Opus 4.5 in a casual Claude Code session, approximately matching the best human performance in 2 hours
Is this saying that Claude matched the best human performance, where the human had two hours? I think that is the correct reading, but I'm not certain they don't mean that Claude had two hours, and matched the best human performance where the human had an arbitrary amount of time. The former is impressive but the later would be even more so.
- Idle side note: surprised that https://github.com/anthropic is just some random dude in Australia
- > This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.
Was the screening format here that this problem was sent out, and candidates had to reply with a solution within 2 hours?
Or, are they just saying that the latest frontier coding models do better in 2 hours than human candidates have done in the past in multiple days?
- Is it "write 20 astroturfing but somewhat believable posts about the merits of "AI" and how it is going to replace humans"?
by demirbey05
1 subcomments
- It's showcase more than being take home assignment. I couldnt understand what the task is ,only performance comparisons between their LLM
by kristianpaul
2 subcomments
- “If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at performance-recruiting@anthropic.com with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.”
- Are you allowed to change the instruction sequence? I see some optimization opportunities - it'd be obviously the correct thing to do an optimizing compiler, but considering the time allotted, Id guess you could hand-optimize it, but that feels like cheating.
by Incipient
3 subcomments
- >so we can be appropriately impressed and perhaps discuss interviewing.
Something comes across really badly here for me. Some weird mix of bragging, mocking, with a hint of aloof.
I feel these top end companies like the smell of their own farts and would be an insufferable place to work. This does nothing but reinforce it for some reason.
by svilen_dobrev
0 subcomment
- if anyone is interested to try their agent-fu, here's some more-real-world rabbit-hole i went optimizing in 2024. Note this is now dead project, noone's using it, and probably same for the original. i managed to get it 2x-4x faster than original, took me several days then. btw There are some 10x optimizations possible but they break few edge cases, so not entirely correct.
https://github.com/svilendobrev/transit-python3
by karmasimida
1 subcomments
- I am able to beat this 1487 benchmark by switching between LLMs, doesn't seem that hard lol. Albeit, I do not fully understand what the solution is, loll
by mips_avatar
0 subcomment
- Going through the assignment now. Man it’s really hard to pack the vectors right
by sublimefire
1 subcomments
- Did a bit of soul searching and manually optimised to 1087 but I give up. What is the number we are chasing here? IMO I would not join a company giving such a vague problem because you can feel really bad afterwards, especially if this does not open a door to the next stage of the interview. As an alternative we could all instead focus on a real kernel and improve it :)
- Oh, this was fun! If you like performance puzzles you should really do it. Actually I might go back and see if I can improve on it this weekend…
- The snarky writing of "if you beat our best solution, send us an email and MAYBE we think about interviewing you" is really something, innit?
- Yet Claude is the only agent which deadlocks (blocks in GC forever) after an hour of activity.
- This is a knowledge test of GPU architecture?
by potato-peeler
0 subcomment
- What does clock cycles mean? Don’t think they are referring to the cpu clock?
- I could only cut it down to 41 cycles.
by pickpocket
0 subcomment
- i cleared this one but didn't clear the follow up interview that was way easier than this
- I wonder if the Ai is doing anything novel? Or if it's like a brute force search of applying all types of existing optimizations that already exist and have been written about.
by dhruv3006
1 subcomments
- I wonder if OpenAI follows suit.
by spencerflem
1 subcomments
- Oh wow it’s by Tristan Hume, still remember you from EyeLike!
by alexpadula
0 subcomment
- Looks rather fun!
- Interesting... Who would spend hours working for free for some company that promised only that they would invite you for a job interview. Maybe.
by mrdootdoot
0 subcomment
- “In English, Data”
by zeroCalories
5 subcomments
- It shocks me that anyone supposedly good enough for anthropic would subject themselves to such a one sided waste of time.
by OhNoNotAgain_99
0 subcomment
- [dead]
by mannykannot
1 subcomments
- I beat the target by deleting the parts that were causing the cycle count to be too high. /s
- [flagged]
- [flagged]
by tmp-127853716
2 subcomments
- [flagged]
- [flagged]
by jackblemming
6 subcomments
- Seems like they’re trying to hire nerds who know a lot about hardware or compiler optimizations. That will only get you so far. I guess hiring for creativity is a lot harder.
And before some smart aleck says you can be creative on these types of optimization problems: not in two hours, it’s far too risky vs regurgitating some standard set of tried and true algos.