class Riffer::Evals::Evaluator

Base class for all evaluators in the Riffer framework.

Provides a DSL for defining evaluator metadata and the evaluate method. Simple evaluators only need to set instructions — the base class handles calling the judge automatically.

See examples/evaluators/ for reference implementations.

class MyEvaluator < Riffer::Evals::Evaluator
  instructions "Assess medical accuracy of the response..."
  higher_is_better true
  judge_model "anthropic/claude-opus-4-5-20251101"
end

Public Class Methods

higher_is_better (value = nil)

Source

# File lib/riffer/evals/evaluator.rb, line 33
def higher_is_better(value = nil)
  return @higher_is_better.nil? || @higher_is_better if value.nil?
  @higher_is_better = value
end

Gets or sets whether higher scores are better.

instructions (value = nil)

Source

# File lib/riffer/evals/evaluator.rb, line 24
def instructions(value = nil)
  return @instructions if value.nil?
  @instructions = value.to_s
end

Gets or sets the evaluation instructions (criteria and scoring rubric).

judge_model (value = nil)

Source

# File lib/riffer/evals/evaluator.rb, line 42
def judge_model(value = nil)
  return @judge_model if value.nil?
  @judge_model = value.to_s
end

Gets or sets the judge model for LLM-as-judge evaluations.

Public Instance Methods

evaluate (input:, output:, ground_truth: nil, messages: [])

Source

# File lib/riffer/evals/evaluator.rb, line 62
def evaluate(input:, output:, ground_truth: nil, messages: [])
  instr = self.class.instructions
  raise NotImplementedError, "#{self.class} must set instructions or implement #evaluate" unless instr

  evaluation = judge.evaluate(
    instructions: instr,
    input: format_input(input),
    output: output,
    ground_truth: ground_truth
  )

  result(score: evaluation[:score], reason: evaluation[:reason])
end

Evaluates an input/output pair.

The default implementation calls the judge with the class-level instructions. Override this method for custom evaluation logic (e.g. rule-based evaluators).

input: the input to evaluate; String or Array of message hashes/Message objects.
output: the agent’s response to evaluate.
ground_truth: optional reference answer for comparison.
messages: the full message history from the agent conversation.

Raises NotImplementedError if neither instructions is set nor evaluate is overridden.

Protected Instance Methods

judge ()

Source

# File lib/riffer/evals/evaluator.rb, line 102
def judge
  @judge ||= begin
    model = self.class.judge_model || Riffer.config.evals.judge_model
    raise Riffer::ArgumentError, "No judge model configured. Set judge_model on the evaluator or Riffer.config.evals.judge_model" unless model
    Riffer::Evals::Judge.new(model: model)
  end
end

Returns a Judge instance configured for this evaluator.

result (score:, reason: nil, metadata: {})

Source

# File lib/riffer/evals/evaluator.rb, line 114
def result(score:, reason: nil, metadata: {})
  Riffer::Evals::Result.new(
    evaluator: self.class,
    score: score,
    reason: reason,
    metadata: metadata,
    higher_is_better: self.class.higher_is_better
  )
end

Helper to build a Result object.