Judge Evaluation Pipeline
Scoring Formula
LLM Output
SOURCE
Rubric Score
C = domain_rubric
CLASSIFIER
Weighted Baseline Score
M = (0.8·base + 0.2·coverage) − β·2^h
METRIC
Raw Quality (pre-risk)
Q_raw = 0.7C + 0.3M
WEIGHTED COMBINATION
Domain-Aware Damping
α_eff = α · m_domain
d_raw = 1 − α_eff · h
d = clip(d_raw, 0.75, 1.0)
DAMPING COEFFICIENT
Risk-Adjusted Score
Q_risk = Q_raw · d
RISK ADJUSTMENT
FINAL SCORE
Stored Score
final_score[0,1] = Q_risk
POLICY DEFAULTS
tuned: α = 0.05, β = 0.0
legacy: α = 0.15, β = 0.10
domains: stricter m_domain for engineering / math
logs: raw_quality_score · risk_adjusted_score · policy metadata
Primary data flow
Rubric score (C)
Weighted baseline (M)
Raw quality / risk-adjusted (Q)
Damping coefficient (d)