Zeval

The quality operating system for Agent product teams. Production failure intelligence, automatically turned into explainable assets. Agent 产品团队的质量操作系统。 把生产失败模式，自动转化为可解释、可排序、可回流的质量资产。

Layer所处层级 Workflow / Operating

Stage阶段 Early access早期共建

For面向 Agent product teamsAgent 产品团队

Why Zeval 为什么是 Zeval

Existing tools tell you
whether the system is up. 现有工具只告诉你
系统有没有挂。

Agent products are deploying at scale, but their failure modes are structurally different from traditional software — APIs return 200 while tools fire at the wrong moment and context silently disappears. Zeval sits at the workflow / operating layer, automatically turning failure patterns in real production sessions into explainable, rankable, recyclable quality assets. Agent 产品正在大规模落地，但它的失败方式和传统软件根本不同 —— 接口返回 200，但工具用错时机、上下文悄悄丢失。 Zeval 站在 Workflow / Operating Layer，把真实生产会话中的 failure pattern 自动转成 可解释、可排序、可回流的质量资产。

Ingest & Normalize 采集与结构化层

Sessions → unified event stream Ingest & Normalize

Turn real sessions, trajectories, tool calls, and session metadata into a unified event stream. Session / trace ingestion, tool-call normalization, user / agent turn alignment, outcome / drop / escalation labeling. 把真实会话、轨迹、tool call、session metadata 变成统一事件流。Session/trace ingestion、 tool call normalization、user/agent turn 对齐、outcome / drop / escalation 事件标签。

Failure Discovery 问题发现层

From logs to failure patterns Failure Discovery

Not a log viewer — an automated failure-pattern surface. Failure clustering, recurring-issue detection, session-level anomalies, drop-off point recognition, tool misuse / plan breakage / memory drift classification. 不是展示日志，而是自动发现 failure patterns。Failure clustering、recurring issue detection、 session 级异常、drop-off point 识别、工具误用 / 计划断裂 / 记忆漂移分类。

Production → Eval 评测回流层

Failures become regression sets Production → Eval

Convert failures into continuously validated assets. Auto-generate eval candidates from production traces, manage golden / regression sets, version diffing, fix-verification workflows. 让问题变成可持续验证的资产。从 production traces 自动生成 eval candidates、 golden set / regression set 管理、版本间对比、fix 验证 workflow。

Decision Layer 决策层

Readable for PMs and founders Decision Layer

Make failure readable for PMs, ops, and founders. Where is it dropping, why, who's affected, what to fix first — translating technical errors into product language tied to business outcomes. 让 PM、运营、创始人看得懂、能行动。哪里掉、为什么掉、影响谁、先修哪个 —— 把技术错误翻译为产品语言，连接业务指标。

Production Diagnosis Regression Fix verify

Data Flywheel Self-compounding 越跑越准

4× Four-layer architecture
Ingest → Discovery → Eval Loop → Decision 四层能力架构
Ingest → Discovery → Eval Loop → Decision

100th The 100th customer
more valuable than the 1st — network effect 第 100 个客户
比第 1 个更有价值的网络效应

57% of enterprises have deployed Agents
Quality is the biggest deployment blocker 已部署 Agent 的企业比例
质量不可控 = 最大部署障碍

∞ Data flywheel
Production failure feeds offline eval 数据飞轮
线上失败自动反哺离线评测

Four core theses 我们最核心的四个判断

Today's tools sit in the infra layer and are being commoditized by hyperscalers. The real moat lives upstream — in the workflow & operating layer. 现有工具在 infra 层，正在被大厂商品化。真正值钱的护城河在上游 —— Workflow & Operating Layer。

Tools win on features; standards win on first-mover, data, and ecosystem trust. We build the industry's vocabulary, not a product feature. 工具靠功能赢，标准靠先发 + 数据 + 生态认可赢。我们要做的是行业语言，不是产品功能。

Agent failure modes are structurally different from traditional software — APIs return 200, but tools fire at the wrong moment, context silently disappears. Agent 的失败方式和传统软件根本不同 —— 接口返回 200，但工具用错时机、上下文悄悄丢失。

Production data that doesn't recycle into iteration assets is the biggest systemic waste across every Agent product shipping today. 线上真实数据没有回流为迭代资产，就是所有 Agent 产品迭代最大的系统性浪费。

Want early access? 想要早期接入？

We're working with a small group of Agent product teams to harden the Production → Eval loop in real environments. Write to Roger directly. 我们正在和一小批 Agent 产品团队一起，在真实场景里打磨 Production → Eval 闭环。直接写信给 Roger。

Apply for early access 申请早期接入