He's mostly in very-confident-but-not-even-wrong kind of territory here.
One comment on his note:
> As an example, let’s say an LLM is correct 95% of the time (0.95) in predicting the “right” tokens to drive tools that power an “agent” to accomplish what you’ve asked of it. Each step the agent has to take therefore has a probability of being 95% correct. For a task that takes 2 steps, that’s a probability of 0.95^2 = 0.9025 (90.25%) that the agent will get the task right. For a task that takes 30 steps, we get 0.95^30 = 0.2146 (21.46%). Even if the LLMs were right 99% of the time, a 30-step task would only have a probability of about 74% of having been done correctly.
The main point that for sequential steps of tasks errors can accumulate and that this needs to be handled is valid and pertinent, but the model used to "calculate" this is quite wrong - steps don't fail probabilistically independently.
Given that actions can depend on outcomes of previous step actions and given that we only care about final outcomes and not intermediate failing steps, errors can be corrected. Thus even steps that "fail" can still lead to success.
(This is not a Bernoulli process.)
I think he's referencing some nice material and he's starting in a good direction with defining agency as goal directed behaviour, but otherwise his confidence far outstrips the firmness of his conceptual foundations or clarity of his deductions.