@tangle-network/agent-eval

Domain-agnostic evaluation framework for Tangle agent apps. Multi-turn scenario execution, multi-judge scoring, agent-driver meta-testing, convergence tracking. Every agent (tax, legal, film, gtm) imports this to get a reproducible quality harness.

Install

npm install @tangle-network/agent-eval

Usage

import { BenchmarkRunner, ProductClient, defaultJudges } from '@tangle-network/agent-eval'

const client = new ProductClient({
  baseUrl: 'https://my-agent.tangle.tools',
  routes: {
    signup: '/api/auth/sign-up/email',
    chat: '/api/chat',
    // ...
  },
})

const runner = new BenchmarkRunner(client, {
  scenarios: myScenarios,
  judges: defaultJudges('film production'),
  systemPrompt: MY_SYSTEM_PROMPT,
})

const report = await runner.run()

What's in the box

ProductClient — configurable HTTP client (routes are config, not code)
ScenarioRegistry — auto-discovery + filtering
executeScenario — multi-turn executor with artifact collection
BenchmarkRunner — orchestrates scenarios + judges + scoring
AgentDriver — meta-agent that plays personas against a real product
MetricsCollector — per-turn product state metrics
ConvergenceTracker — completion% over turns
Reporter — markdown + console output
Judges — 4 built-in (domain expert, code execution, coherence, adversarial) + createCustomJudge factory

Tier

Marketplace tier of the agent-builder three-tier architecture. Uses @tangle-network/tcloud for judge LLM calls.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@tangle-network/agent-eval

Install

Usage

What's in the box

Tier

Related

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@tangle-network/agent-eval

Install

Usage

What's in the box

Tier

Related

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages