Results
The results command family works on existing local AgentV run workspaces and index.jsonl manifests. Use it after an eval run to inspect failures, validate manifests, export artifact layouts, combine/delete local run workspaces, or generate a shareable HTML report.
Remote result repository exchange is intentionally not part of agentv results. New eval runs can auto-export to a configured results repo when auto_push: true; manual remote status and sync are Dashboard/API workflows. See Dashboard Remote Results for configuration and sync behavior.
Subcommands
Section titled “Subcommands”| Subcommand | Purpose |
|---|---|
results report | Generate a self-contained static HTML report from an existing run workspace |
results export | Materialize or normalize the artifact workspace structure for a manifest |
results combine | Combine partial local run workspaces into a new local run workspace |
results delete | Delete one or more local run workspaces |
results summary | Print aggregate metrics for a run |
results failures | Show only failing cases |
results show | Display case-level rows from a run workspace |
results validate | Validate that a workspace or manifest resolves correctly |
results report
Section titled “results report”The results report command turns an existing run workspace or index.jsonl manifest into a self-contained HTML report for sharing, inspection, and human review.
agentv results report <run-workspace-or-index.jsonl>Examples:
# Generate report.html next to the run manifestagentv results report .agentv/results/runs/2026-03-14T10-32-00_claude
# Use an explicit output pathagentv results report .agentv/results/runs/2026-03-14T10-32-00_claude/index.jsonl \ --out ./reports/human-review.htmlWhat it shows:
- Summary stats — total tests, passed, failed, pass rate, duration, and cost
- Eval file groups — test cases grouped by eval file with pass rate, test count, and duration
- Expandable details — unified assertions with pass/fail indicators and type badges, collapsible input/output
- Criteria column — shows the test prompt or description inline for quick scanning
| Option | Description |
|---|---|
--out, -o | Output HTML file (defaults to <run-dir>/report.html) |
--dir, -d | Working directory used to resolve the source path |
results export
Section titled “results export”Use results export when you need the artifact workspace layout itself rather than a rendered report.
agentv results export <run-workspace-or-index.jsonl> [--out <dir>]This is useful when a manifest needs to be materialized into a predictable artifact tree for other tooling, review, or archiving.
Inspection helpers
Section titled “Inspection helpers”For lightweight terminal workflows:
agentv results summary .agentv/results/runs/<timestamp>agentv results failures .agentv/results/runs/<timestamp>agentv results show .agentv/results/runs/<timestamp> --test-id my-caseagentv results validate .agentv/results/runs/<timestamp>For a review-centric workflow built around these artifacts, see Human Review Checkpoint.
Remote results sync/status
Section titled “Remote results sync/status”The CLI contract is deliberately narrow: agentv results manages local result artifacts only. It does not expose results remote status or results remote sync subcommands.
Use these supported remote workflows instead:
- Automatic publishing: configure
projects[].results.auto_push: true; newagentv evalandagentv pipeline benchruns push their artifacts after the run completes. - Manual Dashboard sync: run
agentv dashboard, open the project, and use Sync Project. - Manual API sync: while Dashboard is running, call
GET /api/projects/:projectId/remote/statusorPOST /api/projects/:projectId/remote/syncfor project-scoped automation. Single-project sessions also exposeGET /api/remote/statusandPOST /api/remote/sync. - Git escape hatch: for advanced recovery, inspect or repair the configured
projects[].results.pathclone withgitdirectly, then sync again.