# Rubric Import Schema (JSON)

A rubric is the single source of truth for **what gets evaluated**. It drives both the
annotation page (Trace Review) and the Auto Judgment engine.

## Instructions for an AI assistant

You are given an evaluation goal. Produce a **single JSON object** that matches the schema
below. Output only the JSON (no prose, no code fences). The user will paste it into
**Rubrics → Import full Rubric from JSON** and click **Create from JSON**.

Rules:
- `version` must be a unique slug (lowercase, hyphenated), e.g. `agent-quality-v2`.
- Every question `key` must be lowercase `[a-z0-9_]` and unique within the rubric.
- Use objective question types (`score`, `single_select`, `multi_select`, `yes_no`,
  `button_group`) for anything the Auto Judgment model should answer; use `comment` for
  free-text rationale only.
- `yes_no` is shown as **Yes / No** in the UI and stored as **1 / 0**.

## Top-level object

```jsonc
{
  "name": "Agent Response Quality v2",   // rubric display name
  "version": "agent-quality-v2",          // unique version id across all rubrics
  "status": "published",                  // draft | published | archived
  "scale": "0-5",                         // default scale for score questions
  "guidance": [                            // overall guidance bullets (optional)
    "5 = fully correct with strong evidence.",
    "3 = usable but has gaps.",
    "0-2 = fails the core task."
  ],
  "questions": [ /* see below */ ]
}
```

If `questions` is omitted, the rubric falls back to the legacy `dimensions: string[]` field
(each dimension becomes a 0-5 score), so existing rubrics keep working unchanged.

## Question object

```jsonc
{
  "key": "correctness",        // stable id, [a-z0-9_]; unique within the rubric
  "label": "Correctness",      // shown to annotators and the judge
  "type": "score",             // see types below
  "scope": "trace",            // trace (whole sample) | turn (per conversation turn)
  "required": true,            // optional, default false
  "guidance": "5 = …",         // per-question guidance (optional)
  "answeredByAuto": true        // optional; Auto Judgment answers it (objective types only)
}
```

## Question types

| type | UI | stored answer value | answered by Auto Judgment |
|------|----|---------------------|---------------------------|
| `score` | numeric scale | integer within `scale` | yes |
| `single_select` | dropdown / segmented | one option `value` (string) | yes |
| `multi_select` | chips | array of option `value`s | yes |
| `yes_no` | Yes / No buttons | `1` (yes) or `0` (no) | yes |
| `button_group` | segmented buttons | one option `value` (string) | yes |
| `comment` | free text | string | no (optional rationale only) |

### Type-specific config

- `score` needs `scale`: `{ "min": 0, "max": 5, "step": 1 }` (falls back to the rubric `scale`).
- `single_select`, `multi_select`, `button_group` need `options`:
  `[{ "value": "tool_error", "label": "Tool error" }]`.
- `yes_no` needs no options. The UI shows **Yes / No**; the backend stores **1 / 0**.
- `comment` is free text and is never required of the Auto Judgment model.

## Answer shape (for reference)

Annotations and auto judgements store answers as:

```jsonc
{
  "trace": { "correctness": 4, "is_relevant": 1 },
  "turns": { "turn_3": { "turn_verdict": "fail", "issue_tags": ["bad_data"] } }
}
```

## Full example

```json
{
  "name": "Agent Response Quality v2",
  "version": "agent-quality-v2",
  "status": "published",
  "scale": "0-5",
  "guidance": ["5 = fully correct.", "0-2 = fails the task."],
  "questions": [
    { "key": "correctness", "label": "Correctness", "type": "score",
      "scope": "trace", "required": true, "scale": { "min": 0, "max": 5 },
      "guidance": "5 = fully correct with evidence." },
    { "key": "is_relevant", "label": "Is the answer relevant?", "type": "yes_no",
      "scope": "trace" },
    { "key": "failure_type", "label": "Failure type", "type": "single_select",
      "scope": "trace",
      "options": [
        { "value": "none", "label": "None" },
        { "value": "incorrect_answer", "label": "Incorrect answer" },
        { "value": "tool_error", "label": "Tool error" }
      ] },
    { "key": "turn_verdict", "label": "Turn verdict", "type": "button_group",
      "scope": "turn",
      "options": [{ "value": "pass", "label": "Pass" }, { "value": "fail", "label": "Fail" }] },
    { "key": "issue_tags", "label": "Issue tags", "type": "multi_select",
      "scope": "turn", "answeredByAuto": false,
      "options": [
        { "value": "ambiguous", "label": "Ambiguous" },
        { "value": "bad_data", "label": "Bad data" }
      ] },
    { "key": "comment", "label": "Comment", "type": "comment",
      "scope": "turn", "answeredByAuto": false }
  ]
}
```
