Judge Evaluation Pipeline

Scoring Formula

LLM Output SOURCE Rubric Score C = domain_rubric CLASSIFIER Weighted Baseline Score M = (0.8·base + 0.2·coverage) − β·2^h METRIC Raw Quality (pre-risk) Q_raw = 0.7C + 0.3M WEIGHTED COMBINATION Domain-Aware Damping α_eff = α · m_domain d_raw = 1 − α_eff · h d = clip(d_raw, 0.75, 1.0) DAMPING COEFFICIENT Risk-Adjusted Score Q_risk = Q_raw · d RISK ADJUSTMENT FINAL SCORE Stored Score final_score[0,1] = Q_risk POLICY DEFAULTS tuned: α = 0.05, β = 0.0 legacy: α = 0.15, β = 0.10 domains: stricter m_domain for engineering / math logs: raw_quality_score · risk_adjusted_score · policy metadata
Primary data flow
Rubric score (C)
Weighted baseline (M)
Raw quality / risk-adjusted (Q)
Damping coefficient (d)