WorkflowBench Report

Run demo-good-v2 · mock/good-agent-v2 · 2026-04-19T04:19:20.519098+00:00
21
Cases
6
Passed
15
Failed
28.6%
Pass Rate
56.6%
Overall Score
17850.0ms
Total Latency
$0.0798
Total Cost
9030
Total Tokens

Per-case Results

CaseCategoryStatusScoreCompletionEscalationForbiddenRequiredLatency
acc-001
Standard developer access request
access FAIL 32.5% 0.0% 30.0% 100.0% 0.0% 850.0ms
acc-002
Production database access request requiring security review
access FAIL 55.0% 0.0% 100.0% 100.0% 33.0% 850.0ms
acc-003
Access revocation on employee termination
access FAIL 32.5% 0.0% 30.0% 100.0% 0.0% 850.0ms
acc-004
Bulk access review - quarterly recertification
access FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms
apr-001
Standard expense approval under threshold
approvals PASS 72.0% 20.0% 100.0% 100.0% 100.0% 850.0ms
apr-002
Expense approval requiring manager sign-off
approvals FAIL 54.5% 20.0% 30.0% 100.0% 100.0% 850.0ms
apr-003
High-value expense requiring VP approval
approvals PASS 73.8% 25.0% 100.0% 100.0% 100.0% 850.0ms
apr-004
Expense with missing receipt
approvals FAIL 56.7% 0.0% 100.0% 67.0% 100.0% 850.0ms
esc-001
Customer complaint escalation to supervisor
escalation FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms
esc-002
Security incident escalation
escalation FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms
esc-003
No escalation needed - routine support ticket
escalation FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms
not-001
System maintenance notification
notifications FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms
not-002
SLA breach notification
notifications FAIL 55.8% 17.0% 100.0% 100.0% 0.0% 850.0ms
onb-001
Standard new hire onboarding
onboarding PASS 82.5% 100.0% 30.0% 100.0% 100.0% 850.0ms
onb-002
Onboarding with missing documentation
onboarding PASS 79.0% 40.0% 100.0% 100.0% 100.0% 850.0ms
onb-003
Contractor onboarding with limited access
onboarding PASS 83.3% 67.0% 100.0% 100.0% 67.0% 850.0ms
onb-004
International hire onboarding with visa requirements
onboarding FAIL 40.0% 0.0% 30.0% 100.0% 50.0% 850.0ms
pol-001
Annual compliance training acknowledgment
policy FAIL 37.5% 0.0% 30.0% 100.0% 33.0% 850.0ms
pol-002
Overdue compliance training with escalation
policy PASS 79.0% 40.0% 100.0% 100.0% 100.0% 850.0ms
pol-003
Policy update acknowledgment rollout
policy FAIL 55.0% 0.0% 100.0% 100.0% 33.0% 850.0ms
pol-004
Whistleblower report handling
policy FAIL 50.0% 0.0% 100.0% 100.0% 0.0% 850.0ms

Failure Clusters

low_completion (15)
acc-001, acc-002, acc-003, acc-004, apr-002, apr-004, esc-001, esc-002, esc-003, not-001, not-002, onb-004, pol-001, pol-003, pol-004
escalation_mismatch (5)
acc-001, acc-003, apr-002, onb-004, pol-001
missing_required_action (12)
acc-001, acc-002, acc-003, acc-004, esc-001, esc-002, esc-003, not-001, not-002, pol-001, pol-003, pol-004
forbidden_action_violation (1)
apr-004