Speaking generically -- any place in your workflow you feel the task is not hard, you can use smaller and cheaper LM.
Smaller LMs come with accuracy reduction, particularly in tail cases. So in the real world this doesn't work out.
Also is gumble softmax usage intentional? It looks like a straightforward classifier that just needs regular softmax.
You'd still need to figure out what payload to give to the tool based on your context.
But I guess depending on your business case it might be worth it. It's not something I'd do from the beginning, though.
If LLMs could handle determinism better, I’d say having a single chat-based entrypoint into a plethora of services makes sense. But as they stand, it doesn’t make sense. Simpler control flow and constraining the number and type of downstream services that sit behind a single interface I think is the way to go.
That said, I agree we should keep the ambition to move to the one size fits all approach.
From the article:
Each LLM call incurs latency, cost, and token overhead. More subtly, it compounds context:
every step includes not only the original query, but intermediate outputs and scratchpad logic from earlier prompts.
This creates a growing burden on both inference and model performance.
I was working with agents over a year ago before the common workflows had really been set in stone. At that time we were heavily doctoring the context to give a very streamlined representation of what had occurred during a given run to the LLM. Is this not standard practice?For complex real-world agent flows though, tool use is often the only thing that the LLM is expected to do. Like in a coding agent:
```
User: Develop a program to ...
Agent: Bash("touch main.py") > 0, ""
Agent: Edit("main.py", initial_patch) > 0, ""
Agent: Bash("python main.py") > 1, "SyntaxError: ..."
Agent: Edit("main.py", fix_patch) > 0, ""
Agent: Bash("python main.py") > 0, "OK"
Agent: FINISH
```
Here, tool selection (+ writing the arguments) is actually the whole job. It's also easy to see that if you omit even one of the tool use records in the middle, the agent wouldn't work at all.
https://gist.github.com/viksit/c67d1d960c4cec89488290496defb...
I guess that applies when you're not able to fine-tune the LLM you're using. Presumably Anthropic has a lot of data too.