This is Fisher/Box feedback loop (https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writi...) implemented on a modern computational system. LLM is just a component. I wish Sutton had commented on this fuller picture of what we have now instead of commenting just on the LLM/Backprop side of things. I am honestly curious of whether such a loop can at least partially automate discovery.
There are more elements to discovery though. It is still not clear where the initial working model/hypothesis comes from or how the updates are selected (unless it is just parameter induction). I recently read about Hanson's Patterns of Discovery which aims in that direction. I have still not read it, but I am curious if it has any mechanistic clues.
I really like the way he frames this here. I think a lot of people in the twitter comments (and maybe a few here) aren't reading past the introduction. He isn't saying AI systems are incapable of creativity and discovery. He is claiming generative AI without a harness is not capable of creativity and discovery. There needs to be some other system that "recognizes the value" of the novel idea and remembers it. He gives examples of where this value recognition step is automated and thus by his definition achieve creativity and discovery in a fully automated system.
AlphaGO is given a hard evaluation externally. It did not itself come up with it.
When GAI models are given an external hard evaluation, they can also succeed in many different domains (that is one of the remarkable features, succeeding in many domains) ranging from simple programming tasks to frontier mathematics (disproving conjectures recently) to writing more optimized kernel code than before.
And there is plenty of RL especially in these fields where the solution may be extremely complex but eval is rather less complex. And even the discovery and the "evolution-like" trace-selection is also happening.
For this reason it seems strange to compare it to AlphaGO as alphago is given a hard eval independent of itself, from an external source (humans) in a narrow domain. If GAI is given such, it can also show some remarkable results.
But what I find more strange is that innovation and moving forward in many many many cases does not require truly novel ideas but instead a high-quality execution of layering different methods, tactics, ideas on top of each other. Because in many domains our collective knowledge is incredibly sparse and complex, something being able to recombine tools, models, ideas in a high quality way (as he mentions being selective) I think is extraordinarily powerful. And in such cases, with a finite exploration horizon (time, resource available) with 1% "good choices" vs 3% "good choices" are worlds apart, incomparable.
Most importantly: none of the above is about intelligence, it's barren solution-farming to important, valuable problems we have. Most of the AGI and intelligence-related debate seems to miss out on this simple fact. (Insert the usual stuff like a plane being unable to fly like a bird or a submarine not swimming is totally irrelevant to it being useful).
And then a final point: do we really think this thing is incapable of doing better on average on problems we average people face in our lifetime? What should we think, how should we define human intelligence when we give out degrees in science or medicine for 60-70% exam results on problems considered to be generic in the field?
If it's a), he doesn't propose such an algorithm, and I don't know how you'd do it at such a low level because how do you quantify abstract goals? Did he suggest such an algorithm and I misread? If it's b), that already exists, see AlphaEvolve or any number of things he said. Or, to be a bit of a smart-ass, just type /goal and let it rip ...
I also think he's just categorically wrong that LLMs cannot do good and novel things. And if it can, then you could just say "well that's not novel, that's derivative". A simple example, if I make up a programming language with an LLM and it works well for my purposes, then is that not novel and good? I mean, is any language other than FORTRAN not novel?
Everything is derivative and you can put an LLM in a loop to evaluate LLMs trying things. I must be misunderstanding because he's too smart to be this wrong.
Can A.I create art. Well it can create something that's pleasing to our senses but art is ultimately about conveying human feelings and emotions. Even as humans, understanding art is not universal. "feelings and emotions" and therefore art, can be deeply tied to a particular groups shared beliefs and experiences.
Can it be creative in non-subjective fields such as math or sciences. Einstein derived GR from his creative thought experiments. If an A.I poped out GR's field equations simply by testing different mathematical frameworks that resolve the issues discovered by experiments, is that creative? Perhaps but certainly not in the same way.
But I think humans are better at it, while ML is better at algorithmic thinking. “Better” being more efficient and something we more enjoy doing; we can also more accurately rank what subjectively appeals to humans (i.e. taste), especially ourselves.
I think ML should be optimized for tasks that require more generalization than programming, but are still mostly logic. Like software development, translation, and tools for art and discovery.
Even for humans the brains who managed a step change in thinking are so rare that we literally know them by name.
(Currently returning 502 "Bad Gateway" for me, but should be restored at some point.)
This doesn't seem true? You can be both random and based on training data.
That contradiction kind of says he doesn't know what he's talking about.
Still about ten million discussions to go.
> Claude-Code, which have brought true advances in ... programming. ... these systems have found things that are both novel and good.
I don't think I would attribute anything in that process that I would consider an AI to be incapable of.
The characterisation of variation like this would seem to rest on the same 'random but directed' crutch that some free will arguments rest upon.
There is no random but directed of course, there is random and there is caused, and there are things that use both as components, but the random remains wholly random, and the caused remains entirely deterministic.
I think there is a good case to say that, in many fields, AI is better than humans at evaluation.
To find avenues to consider, I'm not entirely convinced that human innovation is more than a heuristic that appears more chaotic by virtue of a inconsistent and opaque formulation.
Many aspects of ideas com from noting how some two things are different and then considering that axis of difference when applied to another thing.
The possibilities thrown up by this extremely simple method are vast enough to require multiple layers of evaluation, most could be dismissed out of hand by a quick 'This is nonsense' check that I suspect people do so often and at a rate that it wouldn't even rise to the level of consciousness.
Then it shifts to discovery.
These seems related but not exactly the same thing.
Best thing about nerds is watching them try and build frameworks and formulas for the creative act. Like a metronome trying to compose a symphony.
The point seems to be that generative AI just generates stuff, and that real discovery requires variation, evaluation and selective retention.
The call to arms seems based on the assumption that people only every talk about generative AI as discovery machines themselves. I think it's pretty widely accepted that's not the case by everyone apart from cliche out-of-touch CEOs.
But the talk makes me realise that generative AI are incredible tools to do the discovery cycle with, and this is what I imagine professionally successful AI users are doing: variation, evaluation and selective retention of their inputs and outputs to generative AI.
A joke is just an "error" - your brain predicted something, and the butt of the joke goes in another direction, and it's the mismatch that makes it funny.
The same goes with creativity, and intelligence.
The problem is that, by design, while trying to make machines "reliable", we make it impossible for them to be intelligent and creative
And the core point is not even true. They can definitely output novel things that are good - less so but they can and they do. Plenty of examples.
> Thus, the trajectory is either novel or good—based on randomness or based on data—but never both at the same time.
This assumes no possible unexplored path yields good results, or said another way, that none of the random results can be good, which is not true. The whole text seems to try to prove a point decided a-priori rather than make a case based on reality.
Add this to the long list of names like Terence Tao, and others who seem to be intellectually incontinent lately in the sense that one cannot navigate this space anymore without encountering their thoughts
Should we automate exercise and play as well? How about learning?
The machine didn't have a soul, so we donated ours.
Eureka! My AI found it!
His main point is that discoveries involve
1. Variation,
2. Evaluation, and
3. Selective retention.
He makes a jump saying AI is only capable of 1) and humans are capable of 1) 2) and 3). I don't know what makes humans special enough that they can do 2) and 3)?
In fact, the more you think of this it is kind of strange - in science humans can only do "evaluation" because they have access to the real world. They can evaluate a new drug because they can do it on people so it is not some inherent limitation of AI but rather access to physical realm.
Finally I want to ask a specific thing: how do you mathematically falsify what this person is saying? How can you formally prove that - no AI can not "evaluate"? I ask because I make AI evaluate a lot of people's claims and it works for me.
I tell jokes and the group of friends get them, for family they don't get them anymore.
I do not like 'basic jokes' and despite that, german television is full of it.
Most things in our world are more of a challange of finding the answers and not 'creating' the answers.
Math: the answer is already there, you only need to find it an fast space of posibilities. This is perfect for LLMs.
Creativity: a LLM can iterate over things a lot faster than a person. So we can iterate over this space too. We can also get feedback from people, tiktok, instagram.
Stochastic: describes a process, model, or system that involves inherent randomness, meaning its future states or outcomes cannot be predicted with absolute certainty but can be described using probability distributions.