FRESH

Hacker News

Home

Claude Advanced Tool Use

656 points by lebovic

by theknarf

12 subcomments

We should just build more CLI tools, that way the agentic AI can just run `yourtool --help` to learn how to use it. Instead of needing an MCP-server to access ex. Jira it should just call a cli tool `jira`. Better CLI tools for everything would help both AI and humans alike.

by jmward01

10 subcomments

The Programmatic Tool Calling has been an obvious next step for a while. It is clear we are heading towards code as a language for LLMs so defining that language is very important. But I'm not convinced of tool search. Good context engineering leaves the tools you will need so adding a search if you are going to use all of them is just more overhead. What is needed is a more compact tool definition language like, I don't know, every programming language ever in how they define functions. We also need objects (which hopefully Programatic Tool Calling solves or the next version will solve). In the end I want to drop objects into context with exposed methods and it knows the type and what is callable on they type.

by jawns

5 subcomments

I'm starting to notice a pattern with these AI assistants.
Scenario: I realize that the recommended way to do something with the available tools is inefficient, so I implement it myself in a much more efficient way.
Then, 2-3 months later, new tools come out to make all my work moot.
I guess it's the price of living on the cutting edge.

by losvedir

4 subcomments

I never really understood why you have to stuff all the tools in the context. Is there something wrong with having all your tools in, say, a markdown file, and having a subagent read it with a description of the problem at hand and returning just the tool needed at that moment? Is that what this tool search is?

by swapnilt

0 subcomment

The 'tool use' framing is interesting but feels like a rebranding of what's essentially sophisticated prompt engineering with structured outputs. The real limitation isn't whether Claude can 'use' tools—it's the latency and token overhead. Has anyone benchmarked whether these tool calls are actually faster/cheaper than fine-tuning smaller models with deterministic output schemas? Curious if the 'advanced' framing here is product differentiation or genuine architectural improvement.

by behnamoh

4 subcomments

I cannot believe all these months and years people have been loading all of the tool JSON schemas upfront. This is such a waste of context window and something that was already solved three years ago.

by cube2222

2 subcomments

Nice! Feature #2 here is basically an implementation of the “write code to call tools instead of calling them directly” that was a big topic of conversation recently.
It uses their Python sandbox, is available via API, and exposes the tool calls themselves as normal tool calls to the API client - should be really simple to use!
Batch tool calling has been a game-changer for the AI assistant we've built into our product recently, and this sounds like a further evolution of this, really (primarily, it's about speed; if you can accomplish 2x more tools calls in one turn, it will usually mean your agent is now 2x faster).

by ra

6 subcomments

This is heading in the wrong direction.
> The future of AI agents is one where models work seamlessly across hundreds or thousands of tools.
Says who? I see it going the other way - less tools, better skills to apply those tools.
To take it to an extreme, you could get by with ShellTool.

by michaelanckaert

3 subcomments

The "Tool Search Tool" is like a clever addition that could easily be added yourself to other models / providers. I did something similar with a couple of agents I wrote.
First LLM Call: only pass the "search tool" tool. The output of that tool is a list of suitable tools the LLM searched for. Second LLM Call: pass the additional tools that were returned by the "search tool" tool.

by fofoz

1 subcomments

It’s quite obvious that at some point the entire web will become a collection of billions of tools; Google will index them all, and Gemini will dynamically select them to perform actions in the world for you. Honestly, I expected this with Gemini 3

by rfw300

2 subcomments

I am extremely excited to use programmatic tool use. This has, to date, been the most frustrating aspect of MCP-style tools for me: if some analysis requires the LLM to first fetch data and then write code to analyze it, the LLM is forced to manually copy a representation of the data into its interpreter.
Programmatic tool use feels like the way it always should have worked, and where agents seem to be going more broadly: acting within sandboxed VMs with a mix of custom code and programmatic interfaces to external services. This is a clear improvement over the LangChain-style Rupe Goldberg machines that we dealt with last year.

by _pdp_

14 subcomments

Our agentic builder has a single tool.
It is called graphql.
The agent writes a query and executes it. If the agent does not know how to do particular type of query then it can use graphql introspection. The agent only receives the minimal amount of data as per the graphql query saving valuable tokens.
It works better!
Not only we don't need to load 50+ tools (our entire SDK) but it also solves the N+1 problem when using traditional REST APIs. Also, you don't need to fall back to write code especially for query and mutations. But if you need to do that, the SDK is always available following graphql typed schema - which helps agents write better code!
While I was never a big fan of graphql before, considering the state of MCP, I strongly believe it is one of the best technologies for AI agents.
I wrote more about this here if you are interested: https://chatbotkit.com/reflections/why-graphql-beats-mcp-for...

by olliem36

0 subcomment

Sounds good for tasks like the excel example in the article, but I wonder how this approach will hold up in other multi-step agentic flows. Let me explain:
I try to be defensive in agent architectures to make it easy for AI models to recover/fix workflows if something unexpected happens.
If something goes wrong halfway through the code execution of multiple 'tools' using Programmatic Tool Calling, it's significantly more complex for the AI model to fix that code and try again compared to a single tool usage - you're in trouble, especially if APIs/tools are not idempotent.
The sweet spot might be using this as a strategy to complete tasks that are idempotent/retryable (like a database 'transaction') if they fail half way through execution.

by jameslk

5 subcomments

> Tool Search Tool, which allows Claude to use search tools to access thousands of tools without consuming its context window
At some point, you run into the problem of having many tools that can accomplish the same task. Then you need a tool search engine, which helps you find the most relevant tool for your search keywords. But tool makers start to abuse Tool Engine Optimization (TEO) techniques to push their tools to the top of the tool rankings

by babyshake

1 subcomments

A couple points from this I'm trying to understand:
- Is the idea that MCP servers will provide tool use examples in their tool definitions? I'm assuming this is the case but it doesn't seem like this announcement is explicit about it, I assume because Anthropic wants to at least maintain the appearance of having the MCP steering committee have its independence from Anthropic.
- If there is tool use examples and programmatic tool calling (code mode), it could also make sense for tools to specify example code so the codegen step can be skipped. And I'm assuming the reason this isn't done is just that it's a security disaster to be instructing a model to run code specified by a third party that may be malicious or compromised. I'm just curious if my reasoning about this seems to be correct.

by Nition

0 subcomment

I see the pendulum has finished its swing from
> I HAVE NO TOOLS BECAUSE I’VE DESTROYED MY TOOLS WITH MY TOOLS.[1]
to
> TOOL SEARCH TOOL, WHICH ALLOWS CLAUDE TO USE SEARCH TOOLS TO ACCESS THOUSANDS OF TOOLS
---
[1] https://www.usenix.org/system/files/1311_05-08_mickens.pdf

by mrinterweb

0 subcomment

The whole time while reading over this, I was thinking how a small orchestrator local model might help with somewhat known workflows. Programmatic orchestration is ideal, but can be impractical for all cases. In the interest of reducing context pollution, improving speed, and providing a better experience; I would think the ideal hierarchy for orchestration would be programmatic > tiny local LLM > frontier LLM. The tiny model doesn't need to be local as computers have varying resources.
I would think there would be some things a tiny model would be capable of competently managing and faster. The tiny model's context could be regularly cleared, and only relevant outputs could be sent to the larger model's context.

by lewisjoe

0 subcomment

The criticisms here surprise me. "Programmatic Tool Calling" is a huge leap when you want AI to work with your app - like a human would.
I've been trying to get LLMs to work in our word processor documents like a human collaborator following instructions. Writing a coding agent is far more straightforward (all code are just plain strings) than getting an agent to work with rich text documents.
I imagined the only sane way is to expose a document SDK and expect AI to write programs that call those SDK APIs. That was the only way to avoid MCPs and context explosion. Claude has now made this possible and it's exciting!
Hope the other AI folks adopt this as well.

by storus

1 subcomments

What are the current ways to minimize context usage when streaming with multiple tool calls? I can offload some stuff to tools themselves, i.e. they wrap some LLM doing heavy lifting like going through a 200k-token-long markdown and return only some structured distillation, however, even that can fill main model's context quickly in some scenarios.

0 subcomment

by seniorsassycat

2 subcomments

Feels like the next step will be improving llm lsp integration, so tool use discovery becomes lsp auto complete calls.
This is a problem coding agents already need to solve to work effectively with your code base and dependencies. So we don't have to keep solving problems introduced by odd tools like mcp.

by _jab

1 subcomments

Programmatic tool invocation is a great idea, but it also increasingly raises the question of what the point of well-defined tools even is now.
Most MCP servers are just wrappers around existing, well-known APIs. If agents are now given an environment for arbitrary code execution, why not just let them call those APIs directly?

by morelandjs

0 subcomment

Their tool code use makes a lot of sense, but I don’t really get their tool search approach.
We originally had RAG as a form of search to discover potentially relevant information for the context. Then with MCP we moved away from that and instead dumped all the tool descriptions into the context and let the LLM decide, and it turned out this was way better and more accurate.
Now it seems like the basic MCP approach leads to the LLM context running out of memory due to being flooded with too many tool descriptions. And so now we are back to calling search (not RAG but something else) to determine what’s potentially relevant.
Seems like we traded scalability for accuracy, then accuracy for scalability… but I guess maybe we’ve come out on top because whatever they are using for tool search is better than RAG?

by RobertDeNiro

0 subcomment

These meta features are nice, but I feel they create new issues. Like debugging. Since this tool search feature is completely opaque, the wrong tool might not get selected. Then you'll have to figure out if it was the search, and if it was how you can push the right tool to the top.

by emilsoman

1 subcomments

> The script runs in the Code Execution tool (a sandboxed environment), pausing when it needs results from your tools. When you return tool results via the API, they're processed by the script rather than consumed by the model. The script continues executing, and Claude only sees the final output.
Anyone knows how they would have implemented the pause/resume functionality in the code execution sandbox? I can think of these: unikernels / Temporal / custom implementation of serializable continuations. Anything else?

by arianvanp

2 subcomments

Okay so this is just the `apropos` and `whatis` command¥ to search through available man pages. Then `man` command to discover how the tools work. Followed by tool execution?
Really. We should be treating Claude code more like a shell session. No need for MCPs

by JoshGlazebrook

2 subcomments

Is there a good guide for all of these concepts in claude code for someone coming from Cursor? I just feel like the amount of configuration is overwhelming vs. Cursor to accomplish the same things.

by buremba

0 subcomment

So essentially all Claude users are going to surface the "coding agent", making it more suitable even for generic-purpose agents. That makes sense right after their blog post explaining the context bloating for MCPs.
I have been trying a similar idea that takes your MCP configs and runs WASM JavaScript in case you're building a browser-based agent: https://github.com/buremba/1mcp

by aryehof

0 subcomment

This seems to derive from the “skills” feature. A set of “meta tools” that supports granular discovery of tools, but whereas you write (optional) skills code yourself, a second meta tool can do it for you in conjunction with (optional) examples you can provide.
Am I missing something else?

by zby

0 subcomment

There is huge difference between tools executed on the client and those that run on the server - I wish it was made more clear in announcements like this one what it is referring to.

0 subcomment

by baalimago

1 subcomments

I thought the idea was to isolate the concerns, so that you have a GitHub agent, and a Linear agent, and a Slack agent independently, and that these agents converse to solve the problem?
The monolith agent seems like a generalist which may fail to be good enough at anything. But what do I know

by orliesaurus

0 subcomment

This feels like anthropic just discovered fire and it can now boil water into hot water

0 subcomment

by knowsuchagency

0 subcomment

MCP really deserves its own language. This all feels like a hack around the hack that MCP sits on top of JSON. https://github.com/Orange-County-AI/MCP-DSL

by JyB

1 subcomments

The MCP standard will and has to evolve to address this context issue. It’s a no brainer and this is a perfect example of the direction mcp is going / will go. There’s fundamentally nothing wrong, it’s just protocols updates that have to occur.

by BenderV

1 subcomments

It feels crazy to me that we are building "tool search" instead of building real tool with interface, state and available actions. Think how would you define a Calculator, a Browser, a Car...?
I think, notably, one of the errors has been to name functions calls "tools"...

by vessenes

1 subcomments

I'm confused about these tools - is this a decorator that you can add to your MCP server tools so that they don't pollute the context? How else would I add a "tool" for claude to use?

by dpacmittal

1 subcomments

Why don't they just train their models on a tools directory/marketplace? And use searching only for tools after the training cutoff.

by nthypes

0 subcomment

Just use https://github.com/antl3x/Toolrag and avoid vendor lockin

by ripped_britches

0 subcomment

Unless expertly engineered (like the supabase MCP server is), CLI commands as skills are better most of the time. My skills are a script and a MD file on disk.

by gcanyon

0 subcomment

So how close is this to “RAG for tools”? In the sense that RAG handles aspects of your task outside of the LLM, leaving the LLM to do what it does best.

by visioninmyblood

1 subcomments

I’ve taken a more opinionated stance on this. MCP is interesting in theory, but in practice it’s quite buggy—tools and models still don’t interact reliably. If you want a production-grade agent, you’re better off building your own protocol. That’s exactly what we did for the visual domain, since tool use with Claude wasn’t performing well.
Paper: https://arxiv.org/abs/2511.14210

by tfirst

5 subcomments

We seem to be on a cycle of complexity -> simplicity -> complexity with AI agent design. First we had agents like Manus or Devin that had massive scaffolding around them, then we had simple LLMs in loops, then MCP added capabilities at the cost of context consumption, then in the last month everything has been bash + filesystem, and now we're back to creating more complex tools.
I wonder if there will be another round of simplifications as models continue to improve, or if the scaffolding is here to stay.

by btbuildem

1 subcomments

I like how the conceptual curve of this new frontier is starting to look more and more like a circle. Yes we have these amazing new tools. But hey, we also have decades of practices, honed by selflessly lazy intelligent people into relative efficiency.
It's starting to feel like this will come around to in the end become "self-writing code" -- any problem you pose in the fuzzy human language is gradually converted into hard crystal edges of machine code, but padded with soft escape hatches of natural language to deal with contingencies, surprise edge cases, etc.
Self-writing, self-healing, self-adapting code? Now that we can, perhaps we need to consider whether we should.

by machiaweliczny

0 subcomment

I can see a perl comeback

by menix

0 subcomment

Wrapping tool calls in code together with using the benefits of the MCP output schema was implemented in smolagents for some time. Think that’s even one step further conceptually. https://huggingface.co/blog/llchahn/ai-agents-output-schema

by pupppet

0 subcomment

What’s the best way to prevent the input context from compounding with each tool call?

by ed_mercer

0 subcomment

Funny how they use "Traditional approach" for MCP tool usage, which was released just a year ago.

by logicprog

0 subcomment

This honestly feels like the logical next step for tool calling. Reminds me of the bitter lesson.

by aiiizzz

0 subcomment

Now there's this and "skills", and they're eating each other's lunch.

by cadamsdotcom

0 subcomment

Very clever. Tool search and “code that can orchestrate tool calls” are features that make utter sense and should become opt out for all tools - not opt in.
How did the industry not think to do this in the first place :)

by tinyhouse

0 subcomment

So basically the idea of Claude Skills just for Tools.

by ErikBjare

0 subcomment

Kinda disappointed, doesn't seem all that advanced to me.

by guluarte

0 subcomment

the whole mcp thing is a mess tbh

by stefaniedaene

0 subcomment

[dead]

by zxzfcsu

0 subcomment

[dead]

by sora2video

0 subcomment

[dead]

by jason-richar15

0 subcomment

[dead]

by postalrat

0 subcomment

Tools for tools. How about an LLM tool for tools?

by polyomino

2 subcomments

Unfortunate that they chose python instead of bash as the wrapper. Bash would have wider interoperability across languages and workflows that don't touch python. It would also expose more performant tools.