Flue, Part 2: Stop Triaging JIRA By Hand

Part 1 was the framework pitch. This is the agent I actually wanted to build with it.

The setup is something I keep seeing across teams: JIRA triage is brutal. Tickets pile up in the new-issues queue. An EM or tech lead spends an hour every morning sorting them. Someone classifies, someone re-classifies, someone routes, someone re-routes. Half of them get assigned to the wrong team and bounce around for two days before landing.

The agent I’m going to walk through does that triage work. It reads each new ticket, classifies it, picks a team and a person, and either assigns it or comments on the ticket asking for human eyes. It never guesses. If it isn’t confident, the human gets a structured handoff instead of a wrong assignee.

The plan

Two skills:

classify-ticket: reads the ticket and picks a category, domain, and priority. Self-reports confidence.
route-ticket: given that classification, picks a team and a specific assignee. Also self-reports confidence.

Then a tiny piece of glue:

If both skills come back high-confidence, assign + label, log, done.
Otherwise, comment on the ticket with the agent’s reasoning, label needs-triage, leave the assignee blank.

The whole thing runs on a JIRA webhook. New issue created? Webhook fires, Flue runs, two API calls back to JIRA, response sent. Total wall-clock time: a few seconds.

The agent

import type { FlueContext } from '@flue/sdk/client';
import * as v from 'valibot';
import * as jira from './jira';

export const triggers = { webhook: true };

const Confidence = v.picklist(['high', 'medium', 'low']);

const Classification = v.object({
  category: v.picklist([
    'bug',
    'feature-request',
    'support',
    'security',
    'infra',
    'docs',
    'duplicate',
    'spam',
  ]),
  domain: v.picklist([
    'frontend',
    'backend',
    'database',
    'auth',
    'billing',
    'reporting',
    'kubernetes',
    'unknown',
  ]),
  priority: v.picklist(['low', 'medium', 'high', 'critical']),
  reasoning: v.string(),
  confidence: Confidence,
});

const Routing = v.object({
  team: v.string(),
  assignee: v.union([v.string(), v.null()]),
  reasoning: v.string(),
  confidence: Confidence,
});

export default async function ({ init, payload, env }: FlueContext) {
  const { issue } = payload;

  const agent = await init({ model: 'anthropic/claude-opus-4-7' });
  const session = await agent.session();

  const classification = await session.skill('classify-ticket', {
    args: {
      summary: issue.fields.summary,
      description: issue.fields.description,
      reporter: issue.fields.reporter.displayName,
      labels: issue.fields.labels ?? [],
    },
    result: Classification,
  });

  const routing = await session.skill('route-ticket', {
    args: {
      classification,
      summary: issue.fields.summary,
      description: issue.fields.description,
    },
    result: Routing,
  });

  const confident =
    classification.confidence === 'high' &&
    routing.confidence === 'high' &&
    routing.assignee != null;

  if (confident) {
    await jira.assignIssue(env, issue.key, routing.assignee!);
    await jira.applyLabels(env, issue.key, [
      `category/${classification.category}`,
      `domain/${classification.domain}`,
      `priority/${classification.priority}`,
      `team/${routing.team}`,
    ]);
    return { action: 'assigned', issue: issue.key, ...routing };
  }

  return await escalate(env, issue.key, classification, routing);
}

async function escalate(
  env: any,
  key: string,
  c: v.InferOutput<typeof Classification>,
  r: v.InferOutput<typeof Routing>,
) {
  const body = [
    `Triage agent escalated this ticket for human review.`,
    ``,
    `*Classification (${c.confidence}):*`,
    `- Category: ${c.category}`,
    `- Domain: ${c.domain}`,
    `- Priority: ${c.priority}`,
    `- Reasoning: ${c.reasoning}`,
    ``,
    `*Suggested routing (${r.confidence}):*`,
    `- Team: ${r.team}`,
    `- Assignee: ${r.assignee ?? 'unsure'}`,
    `- Reasoning: ${r.reasoning}`,
  ].join('\n');

  await jira.addComment(env, key, body);
  await jira.applyLabels(env, key, ['needs-triage']);
  return { action: 'escalated', issue: key };
}

Two skill calls, one branch. The whole file fits on a single screen.

The JIRA helper is small enough to inline. Three calls: assign, comment, label.

// jira.ts
const base = (env: any) => `https://${env.JIRA_HOST}/rest/api/3`;
const headers = (env: any) => ({
  Authorization: `Basic ${btoa(`${env.JIRA_EMAIL}:${env.JIRA_TOKEN}`)}`,
  'Content-Type': 'application/json',
});

export async function assignIssue(env: any, key: string, accountId: string) {
  await fetch(`${base(env)}/issue/${key}/assignee`, {
    method: 'PUT',
    headers: headers(env),
    body: JSON.stringify({ accountId }),
  });
}

export async function addComment(env: any, key: string, text: string) {
  await fetch(`${base(env)}/issue/${key}/comment`, {
    method: 'POST',
    headers: headers(env),
    body: JSON.stringify({
      body: {
        type: 'doc',
        version: 1,
        content: [
          { type: 'paragraph', content: [{ type: 'text', text }] },
        ],
      },
    }),
  });
}

export async function applyLabels(env: any, key: string, labels: string[]) {
  await fetch(`${base(env)}/issue/${key}`, {
    method: 'PUT',
    headers: headers(env),
    body: JSON.stringify({
      update: { labels: labels.map((l) => ({ add: l })) },
    }),
  });
}

The skills

A Flue skill is a directory. It has a prompt template and zero or more context files the model reads before answering. That’s where the “advanced reasoning” actually lives, in the context the model gets to look at before it commits to a structured output.

Here is skills/classify-ticket/prompt.md:

You are a triage assistant for our engineering org. Classify the JIRA ticket
below.

Use these context files to understand the categories, domains, and our
priority rubric:

- /context/categories.md
- /context/domains.md
- /context/priority.md
- /context/historical-examples.md

Process:

1. Read the ticket and the context files.
2. Pick the single most accurate category and domain.
3. Pick a priority based on /context/priority.md, not based on how loudly
   the reporter complains.
4. Set confidence based on how clean the match is. If you have to choose
   between two domains, that is "medium" at best. If the ticket is too
   vague to map confidently, that is "low".
5. In `reasoning`, write 2 to 3 sentences explaining the call. This is
   what a human will read if you escalate.

Never guess. We would rather have a human triage an ambiguous ticket than
auto-assign a wrong one.

The ticket:
- Summary: {{ args.summary }}
- Description: {{ args.description }}
- Reporter: {{ args.reporter }}
- Labels: {{ args.labels }}

Three things matter in that prompt. The structured output schema enforces that the model picks from a finite set of categories. The “never guess” instruction backed by the confidence field gives it permission to bail out. The context files give it the org-specific knowledge that the base model can’t have.

skills/route-ticket/ is the same shape. Its own prompt, its own context:

/context/teams.md: what each team owns, with examples.
/context/people.md: who’s on each team, what they specialize in, and who’s currently OOO.
/context/ownership-by-domain.md: table of domain to primary team, with edge cases.

The OOO file is the under-rated piece. If the model’s first pick is on vacation, it should fall back to the team’s secondary, not barrel through with a stale assignment that nobody picks up for a week.

What “confident” actually means here

The confidence field isn’t magic. It’s the model self-reporting after reading the prompt and the context files. It’s wrong sometimes. Two things make it less wrong:

Two paths, one decision. I run both skills before deciding. A “high” on classification but “medium” on routing means we ask a human, even though the model thinks it has half the answer. In practice, “the routing is shaky” is the most common reason to escalate, and I want that signal preserved instead of averaged away.

No null assignees on the happy path. If the routing skill returns assignee: null (it knows the team but not the person), I treat that as not-confident even if the model self-reports “high”. The shape of the result is doing some of the gating work for me.

You could tighten this further. Numeric confidence with explicit thresholds, an explicit “needs more info” exit code from the skill, retries with deeper context. I haven’t needed those yet. The two-skill, two-confidence pattern catches the bulk of bad routings before they happen.

The escalation comment is the product

When the agent isn’t sure, the JIRA comment is where it earns its keep. Look at what it writes:

Triage agent escalated this ticket for human review.

*Classification (medium):*
- Category: support
- Domain: billing
- Priority: medium
- Reasoning: Customer reports a charge they don't recognize. Could be
  a duplicate from the recent retry storm or a real billing bug. Treating
  as support until billing confirms which one.

*Suggested routing (low):*
- Team: billing
- Assignee: unsure
- Reasoning: Billing team owns this domain, but they're rotating an
  on-call this week and I can't tell from /context/people.md who is
  actually picking things up today. Recommend an EM eyeball this and
  route.

A human reading that has all the work pre-loaded. The category guess. The domain guess. The reasoning. The reason it stopped short. They’re not triaging from scratch. They’re auditing a draft.

That’s the actual UX win. Even when the agent fails, it fails in a way that saves the human time.

Deploying it

Flue runs in a few places. For something this small I’d put it on Cloudflare Workers. The request flow is “JIRA webhook, 30 seconds of model calls, JIRA REST API.” No long-running compute, no filesystem, just a function that eats a webhook.

Point JIRA’s automation rule at your worker URL, configure secrets in Cloudflare for the JIRA token and the Anthropic key, push with wrangler deploy. The whole thing has fewer moving parts than the legacy “triage Slack channel” my team used to run.

Where this stops working

It is not a replacement for an EM. It is not a replacement for a tech lead with context. The agent is good at the volume case: routine tickets, clear ownership, well-documented domains. It will struggle at the long tail:

Cross-team tickets that legitimately span two domains.
Tickets where the customer is using the wrong vocabulary for the system.
Anything political. Reorgs, ownership disputes, “you broke our launch.”

The escalation path is what catches those. As long as the agent is willing to stop and ask, the long tail goes to a human anyway. The agent’s job isn’t to be right about everything. It’s to clear the boring 70% so humans can focus on the interesting 30%.

That’s the same pattern I keep finding for these harness-based agents in general. They don’t replace the senior person. They eat the toil that the senior person hates doing, and they hand back the genuinely hard cases with a useful first draft. On net, an excellent trade.