I love being able to type "make an iptables rule that opens 443" instead of having to dig out the man page and remember how to do that. IMO the next natural extension of this is giving the LLM more capability to generate user interfaces so I can interact with stuff exactly bespoke to my task.
This on the other hand seems the other way round, it's like bolting a static interface onto the LLM, which could defeat the purpose of the LLM interface layer in the first place right?
I also think this is pretty big. I think a problem we collectively have right now is that getting MCP closer to real user flows is pretty hard and requires a lot of handholding. Ideally, most users of MCP wouldn't even know that MCP is a thing - the same way your average user of the web has no idea about DNS/HTTP/WebSockets. They just know that the browser helps them look at puppy pictures, connect with friends, or get some work done.
I think this is a meaningful step in the direction of getting more people who'll never know or care about MCP to get value out of MCP.
The whole surface of the MCP specification is already pretty big, and barely any server implements anything beyond the core parts.
With elicitation there was already a lightweight version of this in place in the standard, and I'm not sure I've ever encountered a server or client implementation of it in the wild, and elicitation is an order of magnitude simpler to integrate on a conceptional level.
I fear that this has a significant risk of splintering the MCP ecosystem further (it's already pretty strained due to the transport protocol iterations), and there isn't really a reason to create a official extension (yet), that may worst case also require multiple iterations to get things right.
The MCP community is just reinventing, but yes, improving, what we've done before in the previous generation: Microsoft Bot Framework, Speaktoit aka Google Dialogflow, Siri App Shortcuts / Spotlight.
And interactive UIs in chats go back at least 20 years, maybe not with an AI agent attached...
The next thing that will be reinvented is the memory/tool combination, aka a world model.
Trying to create custom agent APIs to embed apps in chat is a very "monopolist frontier lab" thing to try and do.
> If you want a focused comparison next - for example, benchmarks on coding/math, token-cost examples for a typical session, or API usage differences - I can produce a compact table with sources and numbers.
--> can be answered with yes, so please add a yes button. A no button is not needed.
I'm processing thousands of files using Copilot, and even 20 at a time, it usually skips a couple, and sometimes, when skipping, it merges the data from one file to the next, not applying anything to the second file, other times it completely applies the data parsed from one file to the second --- not a big deal since I'm reviewing each operation manually, but the only reason the error rate is acceptable is the files are so inconsistent that normal techniques weren't working.
Is there an equivalent to "double-keying" where two different LLMs process the same input and it only moves forward if both match perfectly?
The post title is quite editorialized.
What I am imagining is something like a meta UI tool call that just creates a menu. The whole MCP server's purpose might be to add this menu creation capability to the chat user interface. But what you are selecting from isn't known ahead of time, it's the input to the UI.
When they select something I assume it would output a tool call like menuItemSelected('option B'). I suppose if you want your server to do anything specific with this then you would have to handle that in the particular server. But I guess you could also just have a tool call that just sends the inputs to the agent. This could make for what is a very slow to respond but extremely flexible overall UX.
I guess this is not the intended use, but suppose you give your agent generic MCP UI tools for showing any menu, showing any data table, showing a form, etc. So the inputSchemas would be somehow (if this is possible) quite loosely defined.
I guess the purpose is probably more about not having to go through the LLM rather than giving it the ability to dynamically put up UI elements that it has to react to individual interactions with.
But maybe one of the inputs to the dataTable are the query parameters for its data, and the table has a refresh button. Maybe another input is the URI for the details form MCP UI that slides over when you click a row.
Maybe there is an MCP UI for Layout what allows you to embed other MCP UIs in a specific structure.
This might not make sense, but I am wondering if I can use MCP Apps as an alternative to always building custom MindRoot plugins (my Python/web components agentic app framework) to provide unique web pages and UI for each client's agentic application.
I think I may have gotten the MCP Apps and MCP UI a bit conflated here so I probably need to read it again.
Eg you present a "display-graph-chart" tool as a MCP tool, and the agent calls it, it doesn't need to adhere to any protocol except the basic existing MCP protocol, and the UI that's used to interact with the agent would know the best presentation (show it as an embedded HTML graph if in a web ui, show it as a ascii chart if in a terminal, etc)?
Is the idea just to standardize the "output format" of the tool so that any agent UI could display stuff in the same way? so that one tool could work with any agent display?
Given LLMs can generate code complex frontend code, why is so difficult for Antropic / OpenAI to prompt their chat applications to create UI on the fly that matches 100% their Chat applications?
I know this is possible because this is how we do it.
The LLM generates some text that we know how to interpret and we render it on the screen.
Besides, this is exactly how their canvas thing works (both chtgpt and claude) when rendering documents on the side.
From my perspective the challenges for vendors and SaaS providers are [1] discovery [2] monetization [3] disintermediation
I think it's less of a concern if you're Shopify or those large companies that have existing brand moats.
But if you're a startup, I don't think MCP as a channel is a clear-cut decision. Maybe you can get distribution but monetization is not defined.
Also I'm sure the model providers will capture usage data and could easily disintermediate you , especially if your startup is just a narrow set of prompts and a UX over a specific workflow.
The Reforge guys have been talking about a channel shift and this being it but until incentives are clear I'm not sure this is it yet. Maybe an evolution of this.
I'm building an AI coach for job seekers / early stage professionals (Socratify) and while I'd love more distribution from MCP UI integration I think at this point risk is higher than reward...
You need to:
1. Spin up a server that returns UI components.
2. Hand-write a bunch of JSON schemas + tool wiring
So we open-sourced a high-level MCP Server SDK that basically lets you have both the MCP server and React components in the same place:
- Every React component you put in your resources/ folder is automatically built and exposed as an MCP resource + tools. No extra registration boilerplate.
- We added a useWidget hook that takes the tool args and maps them directly into your component props, so the agent effectively “knows” what data the widget needs to render. You focus on UI + logic, the SDK handles the plumbing
Docs for that flow here: https://docs.mcp-use.com/typescript/server/creating-apps-sdk...
We also shipped an MCP Inspector to make the dev loop much less painful: you can connect your MCP server, test UI components from tools (with auto-refresh), and debug how it behaves with ChatGPT/agents as you iterate. https://docs.mcp-use.com/inspector/debugging-chatgpt-apps
Both the SDK and the Inspector are open-source, and any contributions are very welcome :)
Those are the repos:
- SDK: https://github.com/mcp-use/mcp-use
- Inspector: https://github.com/mcp-use/mcp-use/tree/main/libraries/types...
If one of the vendors manages to get their protocol to become the target platform (eg oai and app sdk), that is essentially their vendor lock in to become the next iOS/Android.
Private API’s or EEE strategies are gonna be something to keep an eye for and i wish regulators would step in to prevent them before its too late.
https://www.anthropic.com/engineering/code-execution-with-mc...
The agent discovers tools by exploring the filesystem: listing the ./servers/ directory to find available servers (like google-drive and salesforce), then reading the specific tool files it needs (like getDocument.ts and updateRecord.ts) to understand each tool's interface. This lets the agent load only the definitions it needs for the current task. This reduces the token usage from 150,000 tokens to 2,000 tokens—a time and cost saving of 98.7%.
To me, this looks less like UI interactions and more like the MCP equivalent of maintaining state. You start your program and “click” buttons until you get the desired result, maintaining a constant state between interactions. Isn’t that currently possible if you passed through something like a session-id back to the LLM?
Am I missing something? I’m struggling to see what a UI makes possible that the current workflow does not.
I also generally see/use MCP in terms of remote access to programs through servers. Perhaps that’s where I’m getting lost. Is this exclusively something for local MCP?
We've known for decades that its useful for APIs to be self documented and for responses to use schemas to define the shape of the data.
XML can be verbose and I understand why people preferred JSON for ease use. Had we stuck with REST for the last 20 years we'd be way ahead on that front, though, both in syntax and tooling.
A great example is Github, it's a significantly better dev experience having CC call out to the gh cli for actions than trying to invoke the MCP.
It’s basically a “web App Store” and we side step the existing app stores (and their content guidelines, security restrictions and billing requirements) because it’s all done via a mega app (the MCP client).
How could it go wrong?
If only someone had done this before, we wouldnt be stuck in Apples, etc’s walled gardens…
Seriously though; honest question: this is literally circumventing platform requirements to use the platform app stores. How do you imagine this is going to be allowed?
Is ChatGPT really big enough they can pull the “we’re gonna do it, watcha gonna do?” to Apple?
Who’s going to curate this app store so non technical users (the explicitly stated audience) can discover these MCP apps?
It feels like MCP itself; half baked. Overly ambitious. “We’ll figure the details out later”
Honestly, I think the biggest friction for MCP adoption has been how un-userfriendly it is. It’s great for devs, but not the average users. Users don't always want to chat, sometimes they just want to click a button or adjust a slider. This feels like the answer to that problem.
Full disclosure, I'm partial here because of our work at https://usefractal.dev. We were early adopters when MCP first came out, but we always felt like something was missing. We kept wishing for a UI layer on top, and everyone says it's gonna take forever for the industry to adopt, maybe months, maybe years.
I cannot believe the adoption comes so quickly. I think this is gonna be huge. What do you guys think?
I think I have to be missing what is huge here?
I'm not confident the current AI craze will be net positive for humanity. But one possible good outcome could be that many people prefer simple chat UI to interact with services, most companies have to adopt them and are forced to provide simple, straight, no-nonsense content instead of what they want to sell, while LLMs are just commodity so unnamed Chinese companies can provide models as good as the one from the most VC-funded company so they can't enshittify the UX.
Sigh.
2015 WeChat mini program
...
2025 MCP-UI
I'm tired.