Wrapping Komodo in an MCP server

Komodo is the container-manager I use for my homelab — stacks of Docker Compose, multi-host orchestration, the whole thing. I built a small MCP server in TypeScript that exposes the Komodo API to Claude Code, organised into 15 read tools, 12 execute tools, and 8 write tools. The interesting part isn’t the API wrapping. The interesting part is the partitioning — and the deliberate friction between an agent and the buttons that can take production down.

Why this exists

Two truths I had to reconcile:

Komodo’s web UI is excellent for browsing and one-off operations. It is not excellent for “deploy these four stacks in order, watch the logs of each one, and roll back the third if it doesn’t come up healthy within 30 seconds.”
An LLM agent is great at exactly that kind of scripted-but-not-quite workflow — if it can talk to the underlying system.

An MCP server is the bridge. It exposes a typed surface over Komodo’s HTTP API to any MCP-compatible client (Claude Code, in my case), and the agent can now call start_stack(name="immich") instead of describing what it wishes someone would do.

The shape of it

flowchart LR
    A[Claude Code] -->|SSE / stdio| B[MCP server<br/>komodo-mcp]
    B --> C[Komodo API<br/>HTTP]
    C --> D[Komodo Core]
    D -->|Periphery| E1[Docker host 1]
    D -->|Periphery| E2[Docker host 2]
    D -->|Periphery| E3[Docker host N]
    style B fill:#e3edff,stroke:#1f4eb0
    style D fill:#e6f4ea,stroke:#2c7a3f

Six files, in TypeScript:

File	Role
`index.ts`	Express server, SSE + stdio transport, `/health`
`server.ts`	`McpServer` instance, tool registration
`komodo-client.ts`	Typed HTTP wrapper for Komodo Core
`tools/read.ts`	15 query-only operations
`tools/execute.ts`	12 runtime control operations
`tools/write.ts`	8 configuration change operations

The split between read / execute / write is the design decision that matters. Every other engineering choice is a consequence of it.

Why three categories, not one big bag

The naive version is “expose every Komodo endpoint as one tool.” Komodo has roughly 35 useful endpoints; that’s 35 tools, one big bag, all marked equally dangerous. The LLM calls whatever fits the prompt.

That’s wrong for two reasons.

One: the agent can’t tell from a tool name whether it’s about to look at something or change it. update_stack and get_stack sound similar; one of them takes your evening down.

Two: the human (me) loses the ability to set policy per category. “The agent can read freely; ask me before executing; never write without confirmation” is the policy I actually want. With one big bag, the only policy is “ask me about every single tool call,” which the user will stop reading after the fifth one.

The three-category split makes the policy expressible in the tool name and description. The agent reads read_* and knows the cost is zero. It reads execute_* and knows it’s about to do something. It reads write_* and knows it’s changing config that survives a restart.

Inside each category, the tool names follow Komodo’s own object model — read_stack, read_deployment, read_server — which the LLM picks up immediately because the names are predictable.

The tool surface, in a table

┌───────────┬──────────────────────────────────────────────────────┐
│ READ (15) │ Query state. No side effects.                        │
├───────────┼──────────────────────────────────────────────────────┤
│           │ list_stacks, get_stack, list_servers, get_server,    │
│           │ list_deployments, get_deployment, get_logs,          │
│           │ get_stats, list_alerts, list_users, etc.             │
└───────────┴──────────────────────────────────────────────────────┘
┌───────────┬──────────────────────────────────────────────────────┐
│ EXEC (12) │ Runtime control. Reversible (mostly).                │
├───────────┼──────────────────────────────────────────────────────┤
│           │ start_stack, stop_stack, restart_stack,              │
│           │ pull_stack_images, run_stack, kill_container, etc.   │
└───────────┴──────────────────────────────────────────────────────┘
┌───────────┬──────────────────────────────────────────────────────┐
│ WRITE (8) │ Configuration changes. Persistent.                   │
├───────────┼──────────────────────────────────────────────────────┤
│           │ create_stack, update_stack, delete_stack,            │
│           │ update_server, create_alert, delete_alert, etc.      │
└───────────┴──────────────────────────────────────────────────────┘

The 15/12/8 ratio is informative: most of what you ask an agent to do with a container manager is read. Status checks, log peeks, “is everything green?” The execute tier is the second most common; writes are rare and deserve to be rare.

What a tool actually looks like

Inside tools/read.ts, a tool is a small register call:

server.tool(
  "read_stack",
  {
    name: z.string().describe("Stack name, e.g. 'immich' or 'authentik'"),
  },
  async ({ name }) => {
    const stack = await komodo.getStack(name);
    return {
      content: [{
        type: "text",
        text: JSON.stringify({
          name: stack.name,
          status: stack.info?.state,
          services: stack.config?.file_contents
            ? parseServices(stack.config.file_contents)
            : [],
          last_pulled: stack.info?.latest_hash,
        }, null, 2),
      }],
    };
  }
);

Three things worth noting:

Zod schema for inputs. The MCP SDK uses Zod to define and validate tool arguments. The agent sees the schema and knows what to send; the server gets typed inputs and a runtime check. The cost is one extra dependency; the benefit is that the agent stops sending malformed calls within the first session.
Trimmed, structured JSON output. Komodo’s raw API responses are large — full stack configs, all images, recent deployments, etc. I trim aggressively per tool to the fields an agent actually needs. The remaining response is a few hundred tokens instead of several thousand.
Descriptive descriptions. The agent’s tool-selection comes from the description, not from inference about the name. .describe("Stack name, e.g. 'immich' or 'authentik'") is worth its space.

The friction, where it belongs

execute_* tools include a confirm parameter on the destructive ones:

server.tool(
  "stop_stack",
  {
    name: z.string(),
    confirm: z.literal("yes").describe("Must be exactly 'yes'"),
  },
  async ({ name, confirm }) => { ... }
);

The agent has to send confirm: "yes" for the call to land. It can’t infer this from the prompt; the schema requires it. Two effects:

The agent reads the description, sees the required confirm, and either includes it (because the user obviously wants the stop) or omits it (because the user was asking exploratorily).
If the agent gets too eager, the MCP client logs show the confirm argument — so I can audit what the agent actually committed to versus what it was musing about.

write_* tools go one step further: the tool requests approval from the user through the client’s UI before executing. Claude Code shows a “Tool wants to: update_stack(name=‘nginx’)” dialog with the full diff. My fingers approve it; the agent doesn’t get to.

This is the bit that took me an embarrassing number of attempts to get right. The first version trusted the agent and got bitten in development. The second version asked for confirmation on every tool call and was insufferable. The third version — which is what ships — asks for confirmation only on write_*, surfaces a confirm parameter on destructive execute_*, and lets read_* flow freely. The agent isn’t paternalised; the dangerous edges are.

What I’d build into v2

A dry-run flag. Several execute_* and write_* tools could meaningfully accept dry_run: true and return what would happen. I have it on three tools; it ought to be on all of them.

Per-tool audit log. Right now the audit trail lives in Komodo’s own activity log, which is fine but tied to the user account the MCP server uses. I’d rather have a separate log keyed by MCP session id so I can see what a particular agent run did, not just “the agent account did things.”

A “scoping” config. I run this single-tenant on my homelab, so every tool can touch every stack. In a less-trusted context, I’d want a config file that limits which stacks/servers an MCP session can address — defence-in-depth against an over-helpful agent.

What this is, in two sentences

A container manager that knows enough English to follow basic instructions is not a science-fiction premise; it’s about 1,000 lines of TypeScript and a careful tool partition. The reason it works isn’t that the agent is clever — it’s that the interface between the agent and the system is shaped to let the agent be useful and stop short of being destructive.

Related: Hard rails for an autonomous code-audit agent — same partition philosophy, different surface.