I’m building a native desktop editor for fiction writers. It’s written in Rust on top of gpui, by the Zed team. Under the hood, it generates a fiction specific AST, runs ~120 prose-craft analyzers and uses a multi-task ONNX transformer model against your manuscript in real time, surfacing things like show-don’t-tell violations, passive voice, pacing issues, and much more.
I started out using gpui-component, but the available Input component was not sufficient for building the complex UI I wanted with a CRDT backing, so I ended up with my own buffer based system and building the entire text editing UX from scratch. This is not recommended!
To solve for this complexity, I embedded a full MCP server directly inside the application binary. It has since become the single most impactful architectural decision I’ve made on this project, not for users, but for how I Claude builds the product itself.
Here’s the case for why your application should do the same.
The Problem: GUI Apps Are Opaque to AI Agents
If you’re building a native desktop application in 2026, you’ve probably noticed a gap. Your AI coding assistant can read your source code, run your tests, and even propose edits. However, your AI cannot always see your running application. It can’t click a button, type into a text field, verify that a diagnostic tooltip rendered correctly, or confirm that a scrollbar stopped at the right position.
For web apps, this is a solved problem with headless browsers, Playwright, Chrome MCP, etc. For native apps, especially those built on GPU-accelerated frameworks like gpui, you’re largely on your own. There’s no DOM to query. There’s no accessibility tree you can trivially script against. The rendered output is just a texture.
I spent too long in a loop that looked like this:
- Read the source code
- Make a change
cargo build- Manually launch the app
- Manually paste in test prose
- Squint at the screen
- Screenshot it myself
- Paste the screenshot into the AI interface
- Repeat
Steps 4 through 8 are the bottleneck, and no amount of faster builds fixes that. The feedback loop is human-gated.
The Solution: Make the App Speak MCP
The Model Context Protocol is essentially a standardized JSON-RPC interface that AI agents already know how to speak. If your app exposes MCP tools, any MCP-compatible client can drive your application programmatically.
My implementation has two pieces:
1. The In-App MCP Server
When launched with --mcp, the app starts a background thread that reads newline-delimited JSON-RPC from stdin and writes responses to stdout. Commands are dispatched into the gpui event loop.
This is ~200 lines of Rust. No external dependencies beyond serde_json. The protocol surface is minimal: initialize, tools/list, and tools/call. That’s it (for now).
2. The Lifecycle Wrapper
This is a separate binary that manages the app process. It:
- Builds the app from source on startup
- Launches it with
--mcp - Proxies all JSON-RPC between the MCP client and the app
- Intercepts a special
rebuildtool call to stop the app, runcargo build, and relaunch, without dropping the MCP connection
The wrapper feels like a hack and there is probably a cleaner solution. When the agent edits Rust source and calls rebuild, the app restarts with the new binary and the agent’s MCP session continues uninterrupted.
The SDLC Loop
Here’s what the development loop looks like with the MCP server in place compared to without:
Before (human-gated):
╭─► Edit .rs Agent
│ cargo build Agent
│ Launch App Agent
│ Paste Prose Human
│ Squint Human
│ Screenshot Human
│ Describe to AI Human
╰───────────╯
After (agent-driven):
╭─► Edit .rs Agent
│ rebuild Agent
│ set_text Agent
│ wait_idle Agent
│ screenshot Agent
╰───────────╯ The “before” loop requires a human at every step past the build. The “after” loop is fully autonomous and the agent drives the entire cycle in ~10-second iterations. Sceeenshots are expensive, you can expose other tools.
What Tools Does the Server Expose?
Here’s my current tool surface. Claude can quickly iterate on the available tools as it adds features too!
| Tool | What it does |
|---|---|
set_text / type_text | Load prose or type at cursor |
press_key | Simulate any keystroke (enter, backspace, Cmd+B, etc.) |
click / double_click / triple_click | Click at pixel coordinates |
drag_select | Click-drag selection |
screenshot | Capture the window to PNG |
get_state | Return cursor position, selection, text content, word count |
get_diagnostics | Return structured analysis results (message, severity, source, byte range) |
wait_idle | Block until both fast and semantic analysis stages complete |
set_view_mode | Switch between Draft, Review, and Analyze modes |
set_nav_pane | Switch sidebar panes (editor, outline, find, diagnostics, settings) |
list_elements | Enumerate UI elements with rendered positions |
hover_diagnostic | Programmatically hover a diagnostic card |
format_state | Query which inline/block formats are active at cursor |
rebuild | Stop → cargo build → relaunch (wrapper-only) |
The total is around 30 tools. The marginal cost of adding a new tool is about 15 minutes; write a match arm, call an existing editor method, return JSON.
I haven’t attempted to expose dynamic tools based upon the current view.
What This Actually Enables
AI-Driven Iteration and Verification
The agent can now verify what it built. It edits paint.rs, calls rebuild, calls set_text with sample prose, calls wait_idle to let the analyzers finish, and calls screenshot to capture the result. It reads the PNG, evaluates whether the margin notes rendered correctly, and iterates. No human in the loop.
Structured Test Authoring
Instead of asserting against internal state (which couples tests to implementation), the agent can write behavioral tests:
set_text("She felt very sad about what happened.")
wait_idle()
diagnostics = get_diagnostics()
assert any(d.source == "show_tell" for d in diagnostics)
assert any(d.source == "redundancy" for d in diagnostics) This tests what the user would experience. If I refactor the analyzer pipeline — change AST nodes, rename modules, swap out models — these tests still pass because they’re testing the product surface, not the implementation. Too many tests at these lower levels just add friction and churn especially with AI coding tools.
The “Rebuild” Pattern
This is another useful pattern. The agent can:
- Edit a
.rsfile - Call
rebuild(wrapper stops the app, runscargo build, relaunches) - Immediately verify the new build
- Evaluate and iterate
The rebuild starts (shouldn’t take too long with Rust’s incremental compilation). The MCP session stays connected. The agent can do edit-verify cycles quicker than I can switch windows.
The Broader Principle
There’s a deeper pattern here. We’re entering a period where the audience for your application’s API is not just other programmers — it’s AI agents. And agents don’t need the same things programmers need. They don’t need beautiful documentation, clever abstractions, or versioned REST endpoints. They need:
- A way to do things
- A way to wait for things
- A way to verify things
MCP in your app gives you a standardized way to expose all three. The protocol handles capability negotiation, tool discovery, and structured responses. Your job is just to wire the tools to your application’s internals.
I started the MCP server to speed up my own development loop. You might want to also use the same MCP server for your power users!
Opinions are my own and not the views of my employer.
© 2026 by Justin Poehnelt is licensed under CC BY-SA 4.0