The skill extraction loop is the practice of turning a repeatedly-used workflow into a reusable, delegatable skill (instructions + tooling + verification) so agents can run it with less supervision and less variance.

You’ll also see this described as a skill-capture loop or “solve once → codify → reuse”.

The goal is not “more automation”. The goal is more reliable delegation: fewer reminders, fewer one-off explanations, and tighter feedback loops.

When to extract a skill

Extract a skill when:

  • The task recurs (or will recur) and the successful path is mostly stable.
  • You can define a clear definition of done.
  • There is a cheap verification step (tests, checks, diff review, invariants).
  • The work can be bounded with explicit inputs/outputs/constraints.

Hold off when the task is genuinely one-off, the domain is still shifting daily, or the work is primarily taste/judgment that you can’t yet explain as constraints.

The loop

  1. Do it once (with instrumentation)
    Run the task end-to-end (often with an agent), keeping a short log of: what worked, what failed, what the agent misunderstood, and what checks actually caught issues.

  2. Extract the minimal runbook
    Write the smallest set of steps that reliably reaches “done”:

    • inputs required and where to find them
    • constraints (what not to change, safety limits)
    • expected outputs/artifacts
  3. Add guardrails
    Make the skill hard to misuse:

    • verification commands (tests, linters, formatters, sanity checks)
    • stop conditions (“if X fails, do Y / escalate”)
    • permissions boundaries (what systems/accounts the skill can touch)
  4. Package as a skill
    Put the runbook somewhere agents can reliably load and follow, with a predictable interface (arguments, expected files, output format). A “skill” might be a Markdown procedure, a small script, or a directory that bundles instructions + tooling.

  5. Reuse and refine
    Each time the skill fails, treat it as signal:

    • fix the skill (instructions/tooling) before running again
    • add a new check if the failure was preventable
    • prune steps that don’t change outcomes

What “good” looks like

A good extracted skill:

  • has a narrow scope (“upgrade dependency X safely”, not “fix the repo”)
  • produces a reviewable artifact (diff, report, checklist, PR)
  • includes a verification path and “what to do on failure”
  • is safe-by-default (least privilege, no surprising side effects)
  • is versioned like code (reviewed, updated, and retired when stale)

References