Robots Atlas>ROBOTS ATLAS

Prompt Engineering in Practice · Prompt Evaluation

Eval sets, ground truth, metrics

Prompt Evaluation

Introduction

Why eyeballing isn't enough, how to build an eval set, what a golden set is, which metrics to pick (exact match, F1, pass@k, BERTScore, rubric scoring) and how to avoid common traps (cherry-picking, leakage, imbalanced classes).