FRESH

Hacker News

Home

Autoresearch: Agents researching on single-GPU nanochat training automatically

181 points by simonpure

by devonkelley

0 subcomment

Karpathy's observation that models feel "cagy and scared" on open-ended problems is the most important thing in this whole thread. The RL training loop that makes these models useful for constrained tasks also makes them conservative when the problem space is ambiguous. That's a fundamentally different issue than capability. It's a disposition problem baked in at the training level.

by daxfohl

0 subcomment

Once this can run on stock hardware, set the goal to be replicating to other machines. You get a nice, massively parallel, intelligent guided evolution algorithm for malware. It could even "learn" how to evade detection, how to combine approaches of existing viruses, how to research attack methods, how to identify and exploit vulnerabilities in open source libraries, how to phish, how to blackmail, etc. Maybe even learns how to coordinate attacks with other instances of itself or "publish" new attacks on some encrypted feed it creates. Who knows, maybe it becomes so rampant that instances have to start fighting each other for compute resources. Or maybe eventually one branch becomes symbiotic with humans to fight off their enemies, etc.

by thesz

0 subcomment

This looks very much like whirlpool. LLM researcher makes LLMs researching LLMs. The quote from old post from Karpathy [1] look very appropriate here

[1] https://karpathy.github.io/2015/05/21/rnn-effectiveness/

  "In particular, setting temperature very near zero will give the most likely thing that Paul Graham might say:
    “is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same thing that was a startup is that they were all the same”
  looks like we’ve reached an infinite loop about startups."

As if Karpathy made an artificial Karpathy-researcher-blogger and set temperature close to zero.

by mikert89

3 subcomments

As ai improves, most tasks will become something like this. Environments setup where the model learns through trial and error
Any human endeavor that can be objectively verified in some environment like this can be completely automated

by garbanz0

2 subcomments

Up next: auto-autoresearch, LLMs searching for autoresearch harnesses and prompts that produce the best results

by freakynit

0 subcomment

Would it make this exercise even more interesting if we add that for every 25%+ improvement in val_bpb, existing limits (5 minute and VRAM usage) are also increased (by certain percentages)? This can simuate human-like dev iterations much more closely. Infra can be auto-scaled using a platform like Modal.

by abeppu

1 subcomments

but the experiments it did that "improved" validation BPB in the GH screenshot were all basically hyperparameter changes right? So is this better or worse, either per experiment or per unit time, than hyperparameter tuning techniques that don't involve an LLM? It's not clear from this if the LLM is more or less making random changes which sometimes work , and or the LLM thinking actually finds "good" changes because of what the LLM has internalized. E.g. how does this compare to a hyperparameter tuning pass with e.g. BayesOpt that does the same number of 5-min training experiments?

by elikoga

0 subcomment

> this means that autoresearch will find the most optimal model for your platform in that time budget
I'm looking forward to finding out what model is optimal on my rtx3090
One thing I'm concerned with is that the model with best bpb after 5 minutes in smaller setups are only about ~10M Parameters in size which is too small for some emergent effects.

by falcor84

3 subcomments

The only thing missing is for the agents to publish and peer-review their research.

by ahmedbaracat

0 subcomment

I am in the process of figuring out how to do something similar but to teach a robotic arm a new task in the physical world for ko-br: https://ko-br.com/

by gregorygoc

0 subcomment

How is this different from AlphaEvolve?
https://en.wikipedia.org/wiki/AlphaEvolve

by oezi

1 subcomments

Is there a Autoresearch for Jupyter somewhere? I point it to a Jupyter cell to improve based on another which calculates the target metric?

by bananzamba

0 subcomment

I like how it runs out of ideas at the end and just changes the random seed

by AlexCoventry

0 subcomment

Wow, Gemini suggested a very similar experiment to me yesterday. Guess I know where it got the idea from, now. :-)

0 subcomment

by ipunchghosts

0 subcomment

Goedel machine.

by gmerc

0 subcomment

Ah here we go again, the Brophet has unleashed another Brophecy. He seems to confuse brute force discovery with research. Only one leads to understanding, the other one is a shrine to Goodharts law.

by lostmsu

0 subcomment

Non-zero based chart makes it look like it was very successful.

by krasikra

0 subcomment

[dead]

by decker_dev

0 subcomment

[dead]

by aplomb1026

0 subcomment

[dead]

by naomi_kynes

0 subcomment

[dead]

by kubb

3 subcomments

[flagged]