Robots Atlas>ROBOTS ATLAS

Prompt Engineering in Practice · Prompt Evaluation

LLM-as-judge

Prompt Evaluation

Introduction

How to use an LLM as a judge for automated output scoring: pointwise vs pairwise, biases (position, self-preference, verbosity), calibration against human judgment, rubric design, ensembles, when LLM-judge fails.