Agentic Harness

An agentic harness is the wrapper around an LLM that turns it into an agent by running it in a loop, giving it tools, and managing state.

Working Definition

Agentic harness = controller loop + tool runtime + context/state management + guardrails + (optional) evaluation.

In practice, the harness is the part that:

Calls the model repeatedly (plan → act → observe → update → repeat)
Executes tools (shell, files, web, APIs, repo ops) on the model’s behalf
Manages context (what to include, summarize, persist, retrieve)
Enforces constraints (step limits, timeouts, budgets, sandboxing, policy checks)
Optionally adds evaluation (tests, graders, benchmarks, self-checks)

Runtime harness - used to build/operate real agents (coding agents, research agents, ops agents).
Evaluation harness - used to run task suites and measure performance across models/variants.

Most “agent capability” comes from the harness:

A good harness usually has: