Guardrails
Guardrails provide pre-processing of input messages (before LLM calls) and post-processing of output responses (after LLM responses). They enable validation, transformation, and content filtering in the agent pipeline.
When to Use Guardrails
Use guardrails when you need to enforce rules that apply to every request โ content filtering, PII redaction, length limits, or input validation. If you need one-off conditional logic, handle it in your tool or agent instructions instead.
Quickstart
require 'riffer' class ProfanityFilter < Riffer::Guardrail def process_input(messages, context:) if messages.any? { |m| m.content&.match?(/badword/i) } block("Profanity detected") else pass(messages) end end end class MyAgent < Riffer::Agent model 'openai/gpt-5-mini' instructions 'You are a helpful assistant.' guardrail :before, with: ProfanityFilter end response = MyAgent.generate("Hello!") response.blocked? # => false response = MyAgent.generate("You are a badword") response.blocked? # => true response.tripwire.reason # => "Profanity detected"
Tip: See
examples/guardrails/for more ready-to-use reference implementations.
Overview
Guardrails can:
-
Transform - Modify messages or responses (e.g., redact PII, normalize text)
-
Pass - Allow data through unchanged
-
Block - Halt execution with a reason (e.g., content policy violation)
Defining a Guardrail
Create a guardrail by subclassing Riffer::Guardrail:
class ContentFilterGuardrail < Riffer::Guardrail def process_input(messages, context:) if contains_inappropriate_content?(messages) block("Content policy violation detected") else pass(messages) end end def process_output(response, messages:, context:) if contains_inappropriate_content?(response) block("Response contains inappropriate content") else pass(response) end end private def contains_inappropriate_content?(data) # Your content filtering logic false end end
Processing Methods
process_input
Override to process input messages before they are sent to the LLM:
def process_input(messages, context:) # messages - Array of Riffer::Messages::Base # context - Optional context passed to the agent # Return one of: pass(messages) # Continue unchanged transform(new_messages) # Continue with transformed data block("reason") # Halt execution end
process_output
Override to process the LLM response:
def process_output(response, messages:, context:) # response - Riffer::Messages::Assistant # messages - Array of Riffer::Messages::Base (conversation history) # context - Optional context passed to the agent # Return one of: pass(response) # Continue unchanged transform(new_response) # Continue with transformed data block("reason") # Halt execution end
Result Helpers
Inside guardrail methods, use these helpers to return results:
pass(data)
Continue with the data unchanged:
def process_input(messages, context:) pass(messages) end
transform(data)
Continue with transformed data:
def process_input(messages, context:) sanitized = messages.map { |m| sanitize_message(m) } transform(sanitized) end
block(reason, metadata: nil)
Halt execution with a reason:
def process_input(messages, context:) block("Content policy violation", metadata: {type: :profanity}) end
Using Guardrails with Agents
Register guardrails with the guardrail DSL method. Pass the guardrail class (not an instance) and any options:
class MyAgent < Riffer::Agent model "anthropic/claude-haiku-4-5-20251001" instructions "You are a helpful assistant." # Input-only guardrail guardrail :before, with: InputValidator # Output-only guardrail guardrail :after, with: ResponseFilter # Both input and output (around) with options guardrail :around, with: MaxLengthGuardrail, max: 1000 end
Extra keyword arguments after with: are forwarded to the guardrailโs initialize method. For example, max: 1000 above is passed as MaxLengthGuardrail.new(max: 1000).
Phases
-
:before- Runs before the LLM call on input messages -
:after- Runs after the LLM call on the response -
:around- Runs on both before and after
Multiple Guardrails
Guardrails execute sequentially in registration order:
class MyAgent < Riffer::Agent model "anthropic/claude-haiku-4-5-20251001" guardrail :before, with: FirstGuardrail # Runs first guardrail :before, with: SecondGuardrail # Runs second end
Response Object
generate returns a Riffer::Agent::Response object:
response = MyAgent.generate("Hello") response.content # The response text response.blocked? # true if a guardrail blocked execution response.tripwire # Tripwire object with block details (if blocked) response.modified? # true if any guardrail transformed data response.modifications # Array of Modification records
Handling Blocked Responses
response = MyAgent.generate("Hello") if response.blocked? puts "Blocked: #{response.tripwire.reason}" puts "Phase: #{response.tripwire.phase}" puts "Guardrail: #{response.tripwire.guardrail}" else puts response.content end
Modification Tracking
When guardrails transform data, modification records track which guardrail made changes and which messages were affected:
response = MyAgent.generate("Hello") if response.modified? response.modifications.each do |mod| puts "Guardrail: #{mod.guardrail}" puts "Phase: #{mod.phase}" puts "Changed indices: #{mod.message_indices}" end end
During streaming, GuardrailModification events are emitted when transforms occur:
MyAgent.stream("Hello").each do |event| case event when Riffer::StreamEvents::GuardrailModification puts "Modified by: #{event.guardrail} (#{event.phase})" when Riffer::StreamEvents::TextDelta print event.content end end
Streaming with Guardrails
Guardrails work with streaming. If blocked, a Riffer::StreamEvents::GuardrailTripwire event is yielded:
MyAgent.stream("Hello").each do |event| case event when Riffer::StreamEvents::TextDelta print event.content when Riffer::StreamEvents::GuardrailTripwire puts "Blocked: #{event.reason}" puts "Phase: #{event.phase}" end end
Custom Guardrail Examples
Unicode Normalizer
class UnicodeNormalizer < Riffer::Guardrail def process_input(messages, context:) normalized = messages.map do |msg| if msg.respond_to?(:content) && msg.content rebuild_message(msg, msg.content.unicode_normalize(:nfc)) else msg end end transform(normalized) end private def rebuild_message(msg, content) case msg when Riffer::Messages::User Riffer::Messages::User.new(content) when Riffer::Messages::System Riffer::Messages::System.new(content) else msg end end end
Token Limiter
class TokenLimiter < Riffer::Guardrail def initialize(limit:, strategy: :truncate) super() @limit = limit @strategy = strategy end def process_output(response, messages:, context:) content = response.content tokens = estimate_tokens(content) if tokens > @limit case @strategy when :truncate transform(truncate_response(response)) when :block block("Response exceeds token limit", metadata: {tokens: tokens, limit: @limit}) else pass(response) end else pass(response) end end private def estimate_tokens(text) text.split.size # Simplified estimate end def truncate_response(response) words = response.content.split.first(@limit) Riffer::Messages::Assistant.new(words.join(" ") + "...") end end
Content Policy Filter
class ContentPolicyFilter < Riffer::Guardrail BLOCKED_PATTERNS = [ /pattern1/i, /pattern2/i ].freeze def process_input(messages, context:) messages.each do |msg| next unless msg.respond_to?(:content) if violates_policy?(msg.content) return block("Input violates content policy") end end pass(messages) end def process_output(response, messages:, context:) if violates_policy?(response.content) block("Response violates content policy") else pass(response) end end private def violates_policy?(text) return false unless text BLOCKED_PATTERNS.any? { |pattern| text.match?(pattern) } end end
Error Handling
Exceptions raised in guardrails propagate directly to the caller. Handle them as you would any other exception:
begin response = MyAgent.generate("Hello") rescue => e puts "Error: #{e.message}" end