class Riffer::Evals::Judge

Executes LLM-as-judge evaluations using the provider infrastructure.

The Judge class handles calling an LLM to evaluate agent outputs and parsing the structured response. It uses tool calling internally to get guaranteed structured output from the judge model.

judge = Riffer::Evals::Judge.new(model: “anthropic/claude-opus-4-5-20251101”) result = judge.evaluate( instructions: “Assess answer relevancy…”, input: “What is Ruby?”, output: “Ruby is a programming language.” ) result # => 0.85 result # => “The response is relevant…”