SimpleEval

Simple Evals
for VibeCoders

Connect your vibecode app & auto-suggest evals

Generate synthetic data to improve current prompts

Monitor and A/B test against real-time data

Greg Brockman

Greg Brockman

OpenAI President

evals are surprisingly often all you need

Garry Tan

Garry Tan

Y Combinator CEO

Evals are emerging as the real moat for AI startups

Mike Krieger

Mike Krieger

Anthropic CPO

If there is one thing we can teach people, it's that writing evals is probably the most important thing.

Kevin Weil

Kevin Weil

OpenAI CPO

Writing evals is going to become a core skill for product managers. It is such a critical part of making a good product with AI.

Inspired from best-in-class evals

We go deep in understand how leading AI solutions are leveraging evals and customise it for vibecoders

Ramp
RAG-powered Classification

Accurate Industry Codes for Every Customer

Ramp transformed their fragmented classification system into a unified, RAG-powered pipeline that delivers precise NAICS codes for every account:

  • Consistent 6-digit NAICS taxonomy shared across Risk, RevOps & Product
  • Two-prompt RAG workflow with candidate retrieval and ranking
  • Always-on eval harness with Kafka streaming for historical comparisons

Evaluation Framework

Retriever (LLM #1)

acc@k ↑60%

Is the correct NAICS code in the shortlist?

Example:

Input: "Tech recruiting firm in San Francisco"

Top-3 Candidates:

1. 561311 - Employment Placement Agencies

2. 541612 - HR Consulting Services

3. 519130 - Internet Publishing

Ranker (LLM #2)

Fuzzy-accuracy

How close is the chosen code to ground truth?

85%

Guardrail Check

PASS

Did the ranker pick from the retriever's list?

Want to learn how big tech is doing evals?

Get a fresh case study delivered each week. No spam, ever :)