中間の悪魔

2026年06月16日 #AI

PythonプロセスがJiraの開いたチケットを監視し、必要なアクションを決定。

idddはこのプロセスを実行するデーモンで、agentにタスクを通知し、agentが契約を読み取り、コードを実行し、結果を投稿する。

原文の冒頭を表示（英語・3段落のみ）

11 min readJust now--My laptop runs a Python process that does almost nothing. Every 60 seconds it asks Jira for the list of open tickets, looks at each one, and decides whether anything needs to happen. Most of the time, nothing does. The whole tick takes under a second. Then it sleeps.I recently wrote about intent-driven delivery. A customer signal turns into a contract on a Jira ticket. An agent reads the contract, does the work, posts evidence. A human approves the result. Six steps, three of them mechanical. The middle was supposed to run without a human babysitting it.The daemon is what runs the middle. It’s called iddd (Intent-Driven Delivery Daemon) and it does almost nothing. That’s the whole design.This post is about how the pieces fit. The daemon is not the agent. The daemon doesn’t generate code, doesn’t decide correctness, doesn’t read the contract. The daemon notices a ticket moved into the wrong status and pokes the agent to come look. Everything interesting happens inside the spawned agent process. The daemon is plumbing.That split is the design. Most projects I’ve seen in this space conflate the orchestrator with the agent. They build something smart at the wrong layer. The agent ends up tangled with retry logic, the orchestrator tangled with prompt engineering, and neither is easy to change. Split them and the daemon stays mechanical and testable. The agent stays a Claude Code process spawned with a known command and known inputs.The piecesThe Python package is iddd/. A handful of files, none of them long:reconciler.py — a loop on a 60-second interval.derive.py — one pure function that takes a Jira issue and returns the next action.queue.py — a SQLite table with a UNIQUE constraint on a dedupe key.worker.py — pulls jobs and spawns claude -p in a fresh repo clone.jira.py and gitHub.py — adapters. REST and the gh CLI.cli.py — iddd run, iddd status, iddd drain, iddd tail.The deps are small on purpose. apscheduler for the loop. requests for Jira REST. pyyaml for config. sqlite3 from the stdlib for the queue. Nothing else.The shape of a tickA tick is what happens every 60 seconds. The reconciler runs a JQL query against Jira for all open issues in the project. For each one it calls derive_action(issue). That function returns either None (nothing to do, ticket is in a steady state) or a tuple like `(“idd-dispatch”, “PROJ-123”)` meaning “this ticket is ready to be picked up by the dispatch agent.” The reconciler then enqueues the action against the SQLite queue.That’s the whole brain. A polling loop, a pure function, a queue.The pure function is the part I’m proudest of. derive_action doesn’t talk to the network on its own. It takes the issue payload — status, labels, comments, description — and returns the next action by inspection. That makes it easy to test. I have a directory of fixture issues — backlog_no_intent.json, needs_details_with_draft.json, to_do_approved.json, in_progress_pr_open.json — and a pytest run that asserts the right action for each. No mocks, no fakes, no fragile mocking-the-mocking. The function reads inputs and returns a decision. When a bug shows up in the field, I capture the issue payload that triggered it, drop it in fixtures, write a failing test, and fix the function.The queue is the second thing I’m proud of, and it’s the smaller idea. The table looks like:CREATE TABLE jobs ( id INTEGER PRIMARY KEY, dedupe_key TEXT UNIQUE NOT NULL, issue_key TEXT NOT NULL, action TEXT NOT NULL, payload TEXT NOT NULL, state TEXT NOT NULL, attempts INTEGER NOT NULL DEFAULT 0, created_at INTEGER NOT NULL, updated_at INTEGER NOT NULL);dedupe_key is {issue_key}:{action} — for example, PROJ-123:idd-dispatch. enqueue() does INSERT OR IGNORE. If a tick fires while the previous tick’s enqueue is still pending in the worker, the second enqueue is a no-op. Dedupe is free, and it’s stored in the database, not in process memory. The daemon can restart mid-job and the queue is intact.The state column is pending | running | done | failed. A worker grabs the lowest-id pending row whose issue_key isn’t already in the running set, transitions it to running, does the work, transitions it to done or failed. Per-issue serialization comes from that “not in the running set” clause: one ticket can only have one job in flight at a time, even though the pool runs multiple workers in parallel across different tickets.What the worker doesHere’s the worker’s core:def run_job(job): clone = f"/tmp/iddd-clone/{job.issue_key}" run(["gh", "repo", "clone", GITHUB_REPO, clone]) try: env = {**os.environ, "CODE_REPO": clone} cmd = ["claude", "-p", " - output-format", "stream-json", " - dangerously-skip-permissions", f"/{job.action} {job.issue_key}"] run(cmd, cwd=os.environ["IDDD_HOME"], env=env, timeout=1800, stream_to_log=True) finally: shutil.rmtree(clone, ignore_errors=True)A fresh gh repo clone per job. A claude -p headless invocation. A 30-minute timeout. Stream the agent’s JSON output into a log file. Tear the clone down when the job ends, succeeded or failed.The fresh clone is cheap and worth it. Workers running in parallel never see each other’s tree. A bad agent that mangles its working directory can’t poison the next job. Worktrees could be used for this, but I’m experimenting with on-demand clones to see how it fares.The slash command — /idd-dispatch, /idd-new, /idd-complete — is where the actual work lives. The daemon hands the agent a ticket key and a verb. The agent does everything else: reads the contract from the ticket, runs the harness, writes the code, opens the PR, posts evidence back to the ticket. The agent uses MCP servers internally for Jira and GitHub. The daemon does not.Get Ian Johnson’s stories in your inboxJoin Medium for free to get updates from this writer.Remember me for faster sign inThat last point matters. The daemon talks to Jira via REST and to GitHub via the gh CLI. It does not use MCP. The reason is testability: MCP servers expect a live claude process to mediate them, which means out-of-band callers, like a polling daemon, can’t reach them cleanly. REST and gh are stable, well-documented, and easy to fake in tests. MCP is for the agent. Everything outside the agent talks to the source systems directly.How a ticket flows throughHere’s a full lap. A teammate files a signal: a one-line problem statement that lands as a Jira ticket in Backlog, tagged with the workflow label and carrying a YAML intent block at the top of the description. That YAML block holds the impact / urgency / clarity scores, the signal source, and a list of open questions the agent needs me to answer before it can write a falsifiable contract. The next tick fires. derive_action looks at the ticket: status Backlog. The decision is no-op — awaiting human triage. The daemon does nothing until I move the ticket.I drag it from Backlog to Needs Details. That’s the first human gate. I’m looking at the seed, deciding if it’s worth carrying forward. If yes, I move it. If no, I edit the description or trash the ticket.Once in Needs Details, the next tick sees the new status. There’s no idd:feedback comment yet, so derive_action returns no-op — awaiting feedback. I post a Jira comment whose first line is the marker idd:feedback, with answers to the seed’s open questions in the body. The next tick picks up the marker. The action is idd-draft. A worker grabs the job, clones the harness repo, spawns claude -p /idd-draft PROJ-123. The agent reads the description, folds my answers into Outcome, Signal, Scope in, Scope out, numbered Acceptance criteria, Release Gating, and Risk Surface, then rewrites the Jira description with the full contract. The Open Questions section is empty — every question is resolved. The agent posts a TP2 notification comment and exits. The ticket stays in Needs Details.I read the contract. If I want to push back — the Outcome is too broad, an Acceptance item isn’t falsifiable, the Scope in is wrong — I post another idd:feedback comment with the design notes I want folded in. The daemon picks that up too and the agent re-renders the contract on the next tick. I iterate until it reads right.When I’m happy, I approve. And here’s the part that’s still in flux.Approval is a comment, for now. The daemon watches for a comment whose first line is the marker idd:approve and treats that as the approval signal. The reconciler enqueues /idd-approve, the agent runs the full approval checklist (Outcome is one present-tense sentence, Acceptance has ≥2 falsifiable items, Scope out is non-empty, releasability holds, flag gating named if user-visible), and only then transitions the ticket Needs Details -> To Do.The reason it’s a comment is a question I haven’t answered: who is the bot user? If the daemon transitions tickets through the real workflow, it needs Jira credentials with permission to do that, which means a service account, which means a license seat, which means a budget conversation. Using a comment-as-approval lets me prove the rest of the loop works while I sort out the identity story. It’s a temporary cheat, and I want to be honest that it’s a cheat — but the alternative was to block the whole project on a procurement question. I’d rather ship and assess. I’m sure comment-as-approval is not the right long-term answer, and I’m honest about that with myself every time I use it.Once approved, the success hook fires an immediate reconciler tick. The ticket is now in To Do. The action is idd-dispatch. The agent reads the contract, transitions Jira To Do -> In Progress, branches off main with a name like PROJ-123/expired-card-msg, runs an inline chain of three skills — /idd-plan writes a per-acceptance test+code plan with no code, idd-implement writes failing tests first then the smallest change to green them all inside Scope in, /idd-review captures green test output and screenshots if the change is user-visible, runs parallel review agents, runs pre-commit, commits with Conventional Commits, pushes, and opens the PR with the Test Plan inlined in the body. The agent then posts an idd:completion comment on the Jira card carrying acceptance evidence, transitions Jira In Progress -> In Review, and fires the TP3 notification with the PR URL.I do a code review on the PR. If I’m happy, I comment idd:accept. If I’m not, I comment idd:reject on the PR with the one-sentence reason, or idd:changes-requested with a revision brief. Reject closes the contract and files a new gap signal. Changes-requested triggers another executor pass on the same PR. On a clean merge the daemon watches the post-merge CI run on main via gh run watch --exit-status, and only on green transitions the ticket In Review -> In Staging. That’s the terminal state for the workflow. UAT happens there. The eventual Done transition is a separate human business decision the harness never makes. Humans own that decision.Six phases. Three with my eyes on the work — promoting the seed, approving the contract, reviewing the PR. The other three the daemon and the agent handle between themselves.The dashboardThe daemon also serves a small local web dashboard on a loopback port, accessible only from the same machine. It’s a single-page app, refreshed live over Server-Sent Events so the view changes within a couple of seconds of any state movement. There’s no build step, no client framework — the daemon emits HTML fragments and htmx swaps them into place on each tick.A Quick actions panel at the top exposes two short forms. One starts a new signal from a one-line problem statement, the other adopts an existing ticket into the workflow. Submitting either form enqueues the corresponding command.Below that, a Pipeline band shows every ticket the workflow currently owns, grouped into columns by status ( Backlog, Needs Details, To Do, In Progress, In Review, In Staging, Done). Each ticket renders as a small card with its key, summary, assignee, and a colored action-state tag along the bottom edge — needs-feedback, approval-requested, interrupt-blocked, review-requested, uat-pending, agent-running, and so on. The tag color is mirrored as the card’s left-border accent so the eye can pick out “what needs me right now?” without reading the words. A red interrupt-blocked card stands out instantly. A green agent-running card recedes into the background.A Ticket inspector lets me type a Jira key and pull up the queue history and live state for that specific ticket.Active, Pending, and Recent tables list the daemon’s queue jobs. Each row shows job id, state badge, action (the slash command being run), attempts, age, and a one-line snippet of the most recent activity event captured from the agent’s stream. Clicking any row reveals a panel that shows full job metadata, the last error if any, a chronological activity timeline of every event the agent emitted, and three small buttons (retry, force-fail, drop) for unsticking jobs manually.A Metrics card sits below: running and pending counts, done and failed totals over the last 24 hours, throughput per hour, failure rate as a percentage, p50 and p95 duration. A per-action breakdown shows total / done / failed / average time for the top job types.A live log tail at the bottom streams the daemon’s structured output, with WARNING and ERROR lines color-highlighted.I don’t need the dashboard to run the workflow. Everything happens through Jira comments and PR comments and Jira status transitions. But when I’m looking at five tickets at once and trying to remember which one is waiting on me versus which one is waiting on the agent, the colored tag along the bottom of each card is the difference between “context switch and read three tickets” and “look once and know.”A note on going remoteThe next move is to put the daemon on a remote host so it doesn’t depend on my laptop being awake. The shape stays the same: reconciler, queue, worker, headless agent. The host moves to a remote machine. The queue can stay SQLite a while longer. The architecture grows a real webhook receiver so we don’t wait 60 seconds for the next tick.I’ll write about the remote build separately. The reason I’m flagging it is that the local design is deliberately shaped to make the remote move boring. The daemon already speaks REST and gh instead of relying on a local-only API. The queue is already durable. The worker already isolates jobs in clones. None of those choices were free on a single laptop, but they all pay off when the host stops being a laptop.What surprised meA few things that didn’t go the way I expected.The pure function is the whole game. I thought the daemon would end up being mostly worker logic. It isn’t. The worker is fifteen lines. derive-action is where the bugs live, and the fixture-driven tests for it are what I iterate against. When I want to change how the workflow behaves, I add a fixture for the new case and update derive_action. The rest of the daemon doesn’t move.Comments make a fine approval mechanism even when they’re a cheat. I expected the comment-as-approval pattern to feel hacky. It mostly doesn’t. Comments are scoped, threaded, timestamped, tied to a user — everything an approval needs. The reason I’m hesitating to make them the permanent answer is bot-user identity, not the UX. The UX is fine.SQLite is the right size of database for this. A single-file durable queue with no daemon to run and no schema migration to write is what a local automation wants. I’ll outgrow it when the daemon goes remote. I haven’t outgrown it yet.The hardest part was deciding what not to put in the daemon. I kept catching myself adding features to the orchestrator instead of the agent. Every time, I had to talk myself out of it. The daemon stays dumb. The agent gets smart. The seam between them is the queue and the slash command name. If the seam blurs, the testability story falls apart.

※ 著作権に配慮し、引用は冒頭3段落までです。続きは元記事をご覧ください。

— 元記事を読む ↗

元記事を読む ↗