Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias
Evaluation framework for LLM knowledge inputs — prompts, RAG corpora, skills, agent workflows. Fix the model, vary the artifact. Built-in statistical rigor: bootstrap CI, Krippendorff α, length-debias, saturation curves.
Marketplace
Independent
Category
engineering
More like this
Browse engineering agents →
Refrax
Command-Line Agentic Refactoring of Java Code
Free
engineeringOpencode Plan Manager
A simple collection of tools for better plan management by AI agents on OpenCode.
Free
engineeringTabnine
Privacy-first AI code completion for enterprise teams
$12/mo
engineeringKitwork
Automate kit workflows effortlessly with a lightweight, high-performance, fast, and flexible engine for cloud or self-hosted environments.
Free