The quality operating system for Agent product teams.
Production failure intelligence, automatically turned into explainable assets.
Agent 产品团队的质量操作系统。
把生产失败模式,自动转化为可解释、可排序、可回流的质量资产。
Layer所处层级Workflow / Operating
Stage阶段Early access早期共建
For面向Agent product teamsAgent 产品团队
Why Zeval为什么是 Zeval
Existing tools tell you whether the system is up.
现有工具只告诉你 系统有没有挂。
Agent products are deploying at scale, but their failure modes are structurally
different from traditional software — APIs return 200 while tools fire at the
wrong moment and context silently disappears. Zeval sits at the workflow /
operating layer, automatically turning failure patterns in real production
sessions into explainable, rankable, recyclable quality assets.
Agent 产品正在大规模落地,但它的失败方式和传统软件根本不同 ——
接口返回 200,但工具用错时机、上下文悄悄丢失。
Zeval 站在 Workflow / Operating Layer,把真实生产会话中的 failure pattern
自动转成 可解释、可排序、可回流的质量资产。
01
Ingest & Normalize采集与结构化层
Sessions → unified event streamIngest & Normalize
Turn real sessions, trajectories, tool calls, and session metadata into a
unified event stream. Session / trace ingestion, tool-call normalization,
user / agent turn alignment, outcome / drop / escalation labeling.
把真实会话、轨迹、tool call、session metadata 变成统一事件流。Session/trace ingestion、
tool call normalization、user/agent turn 对齐、outcome / drop / escalation 事件标签。
02
Failure Discovery问题发现层
From logs to failure patternsFailure Discovery
Not a log viewer — an automated failure-pattern surface. Failure clustering,
recurring-issue detection, session-level anomalies, drop-off point recognition,
tool misuse / plan breakage / memory drift classification.
不是展示日志,而是自动发现 failure patterns。Failure clustering、recurring issue detection、
session 级异常、drop-off point 识别、工具误用 / 计划断裂 / 记忆漂移分类。
03
Production → Eval评测回流层
Failures become regression setsProduction → Eval
Convert failures into continuously validated assets. Auto-generate eval
candidates from production traces, manage golden / regression sets, version
diffing, fix-verification workflows.
让问题变成可持续验证的资产。从 production traces 自动生成 eval candidates、
golden set / regression set 管理、版本间对比、fix 验证 workflow。
04
Decision Layer决策层
Readable for PMs and foundersDecision Layer
Make failure readable for PMs, ops, and founders. Where is it dropping,
why, who's affected, what to fix first — translating technical errors
into product language tied to business outcomes.
让 PM、运营、创始人看得懂、能行动。哪里掉、为什么掉、影响谁、先修哪个 ——
把技术错误翻译为产品语言,连接业务指标。
ProductionDiagnosisRegressionFix verify
Data FlywheelSelf-compounding越跑越准
4×Four-layer architecture Ingest → Discovery → Eval Loop → Decision四层能力架构 Ingest → Discovery → Eval Loop → Decision100thThe 100th customer more valuable than the 1st — network effect第 100 个客户 比第 1 个更有价值的网络效应57%of enterprises have deployed Agents Quality is the biggest deployment blocker已部署 Agent 的企业比例 质量不可控 = 最大部署障碍∞Data flywheel Production failure feeds offline eval数据飞轮 线上失败自动反哺离线评测
Four core theses我们最核心的四个判断
01
Today's tools sit in the infra layer and are being commoditized by hyperscalers.
The real moat lives upstream — in the workflow & operating layer.
现有工具在 infra 层,正在被大厂商品化。真正值钱的护城河在上游 —— Workflow & Operating Layer。
02
Tools win on features; standards win on first-mover, data, and ecosystem trust.
We build the industry's vocabulary, not a product feature.
工具靠功能赢,标准靠先发 + 数据 + 生态认可赢。我们要做的是行业语言,不是产品功能。
03
Agent failure modes are structurally different from traditional software —
APIs return 200, but tools fire at the wrong moment, context silently disappears.
Agent 的失败方式和传统软件根本不同 —— 接口返回 200,但工具用错时机、上下文悄悄丢失。
04
Production data that doesn't recycle into iteration assets is the biggest
systemic waste across every Agent product shipping today.
线上真实数据没有回流为迭代资产,就是所有 Agent 产品迭代最大的系统性浪费。
Want early access?想要早期接入?
We're working with a small group of Agent product teams to harden the
Production → Eval loop in real environments. Write to Roger directly.
我们正在和一小批 Agent 产品团队一起,在真实场景里打磨 Production → Eval 闭环。
直接写信给 Roger。