Triaging Gmail with Claude Subagents

By Justin Poehnelt, Senior Software Engineer at Google · Jun 8, 2026

#ai #agents #claude code #security #gmail #google workspace #cli #code

The email-triage-orchestrator running in Claude Code: ten isolated email-security-analyzer agents finished with all messages clean, then email-relationship-analyzer and email-content-summarizer running in parallel over the batch

Jump right to the code: https://github.com/jpoehnelt/subagent-email-triage.

I put together this demonstration project to triage my Gmail with a small constellation of Claude Code subagents that drive the gws CLI. They read incoming mail, check it for security issues, analyze relations, summarize the content and extract context, apply labels, and turn recurring patterns into persistent Gmail filters.

Six agents

The orchestrator discovers inbox messages and hands everything else to the subagents. It looks like this:

Discover. The orchestrator lists new, untriaged inbox mail.
Security. One email-security-analyzer per message, isolated and never batched.
Relationship and summary. email-relationship-analyzer and email-content-summarizer, run once over the security-clean batch.
Label. email-labeler creates the Gmail labels and applies them to the message.
Curate filters. email-filter-curator promotes stable patterns to persistent filters.

Subagent	Job	Model
`email-security-analyzer`	Deception / phishing / spoofing verdict	Sonnet
`email-relationship-analyzer`	Who the sender is to me	Haiku
`email-content-summarizer`	What the mail actually says	Haiku
`email-labeler`	Apply Gmail labels	Sonnet
`email-filter-curator`	Promote stable patterns to filters	Sonnet

I started with a fat orchestrator and thin subagents. It passed the email bodies and headers down to the subagents. I switched to a thin orchestrator that only discovers and delegates, mostly passing message IDs.

That change isn’t free, each subagent re-fetches what it needs, so the same email gets pulled from the API more than once. In return, the orchestrator never loads message bodies and headers just to pass them down. The reads are cheap and idempotent, worth repeating to keep the coordinating context lean. Also much easier to batch tasks to subagents this way.

I added memory, then removed it

For a while, some subagents had memory, notes they carried across runs, so a relationship agent wouldn’t re-learn who my coworkers are every time. I kept the memory local to the agent and saw that it was capturing a structure likely to go stale. The relationship analyzer was creating a memory file for every sender. I removed it, but would consider adding it back with a better prompt for its usage.

Most token usage goes to security

Here’s the rough token split across the five subagents:

Subagent	% of token usage
`email-security-analyzer`	17%
`email-labeler`	3%
`email-relationship-analyzer`	3%
`email-filter-curator`	2%
`email-content-summarizer`	2%

Security dominates, more than the other four combined. It’s the only stage that runs once per email rather than batched. It also does the most with the email, reading all headers, inspecting the full body, etc.

I started with a zero tool security subagent in the fat orchestrator pattern. This allowed hill climbing with autoresearch and some publicly available datasets of emails. Harder to implement the same with tools, but not impossible.

Tools, permissions, allowlists

The harness has a constraint that shaped everything, an agent’s tools are fixed at definition time. I can’t grant, narrow, or swap a subagent’s capabilities at runtime; its surface is the same for a newsletter as for a phishing email. Under a static tool model, least privilege means splitting the work across narrowly scoped agents, giving each exactly what its job needs.

This is enforced two ways. First, in settings.json. Second, a single global PreToolUse(Bash) hook enforces a different allowlist per agent. Per-agent hooks in frontmatter stack, so a subagent’s call gets checked against the orchestrator’s hook too, forcing the orchestrator’s allowlist to be a superset of all of them. Instead, one guard reads the calling agent_type from the payload and enforces only that agent’s .allowlist, so each agent is governed independently. The security analyzer’s Bash tool allowlist is three lines:

gws gmail users messages get
gws gmail +read
gws schema

The labeler’s allowlist lets it modify labels but never reads a body. The guard also blocks command substitution, subshells, redirects, and pipes into anything that isn’t a read-only filter like jq, so an allowed gws call can’t be piped into something that writes a file or runs code.

Filters and knowledge

Every label is a signal about the pattern of mail, and the filter curator analyzes that. It looks for reusable patterns in the labels and promotes them to Gmail filters. For example, if it notices that I consistently label messages from a certain sender as “Work”, it will create a filter that automatically applies that label to future messages from that sender. It’s forward-looking only, never backfills, and held to a strict threshold before it creates a filter rather than proposing one.

This is the one place I’d reconsider statelessness. Reading existing labels is fine as those facts live in Gmail. But the valuable version isn’t pattern-matching labels, it’s modeling my behavior over time. For example, my replies to emails, what I archive, snooze, or just ignore.

The relationship question grows in breadth, not depth. The same agent could ground its answer in Slack, a company directory, Google Contacts, or Google Calendar.

Things that were annoying

The pipeline order isn’t enforced. “Discover → security → relationship + summary → label → curate” is a prompt instruction, not a guarantee. The hooks and allowlists bound what each agent can do, not the order the orchestrator calls them in. The critical invariant, security first, per email, never batched, rests on the model following instructions.

Subagent permissions: Why can’t I define this more easily?

Subagents cannot call other agents. This forces a more linear sequence and many of the challenges above.

This was a useful exploration, but I think I would want to skip doing this within the Claude Code harness next time.

Opinions are my own and not the views of my employer.

On this page