Skip to content

tangle-network/agent-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

@tangle-network/agent-eval

Domain-agnostic evaluation framework for Tangle agent apps. Multi-turn scenario execution, multi-judge scoring, agent-driver meta-testing, convergence tracking. Every agent (tax, legal, film, gtm) imports this to get a reproducible quality harness.

Install

npm install @tangle-network/agent-eval

Usage

import { BenchmarkRunner, ProductClient, defaultJudges } from '@tangle-network/agent-eval'

const client = new ProductClient({
  baseUrl: 'https://my-agent.tangle.tools',
  routes: {
    signup: '/api/auth/sign-up/email',
    chat: '/api/chat',
    // ...
  },
})

const runner = new BenchmarkRunner(client, {
  scenarios: myScenarios,
  judges: defaultJudges('film production'),
  systemPrompt: MY_SYSTEM_PROMPT,
})

const report = await runner.run()

What's in the box

  • ProductClient — configurable HTTP client (routes are config, not code)
  • ScenarioRegistry — auto-discovery + filtering
  • executeScenario — multi-turn executor with artifact collection
  • BenchmarkRunner — orchestrates scenarios + judges + scoring
  • AgentDriver — meta-agent that plays personas against a real product
  • MetricsCollector — per-turn product state metrics
  • ConvergenceTracker — completion% over turns
  • Reporter — markdown + console output
  • Judges — 4 built-in (domain expert, code execution, coherence, adversarial) + createCustomJudge factory

Tier

Marketplace tier of the agent-builder three-tier architecture. Uses @tangle-network/tcloud for judge LLM calls.

Related

License

MIT

About

Domain-agnostic evaluation framework for Tangle agent apps

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors