Skip to main content

Documentation Index

Fetch the complete documentation index at: https://koreai-v2-home-nav.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Guardrails are safety checks that evaluate agent inputs and outputs to detect harmful, non-compliant, or malformed content. Unlike constraints (which enforce business rules), guardrails protect against safety and quality violations at the content level. The GUARDRAILS: block defines named guardrail rules.

Overview

ABL guardrails use a three-tier evaluation model:
  1. CEL-based (Tier 1) — fast, deterministic expression checks.
  2. Model-based (Tier 2) — pre-trained safety classification models (e.g., OpenAI moderation).
  3. LLM-based (Tier 3) — natural language checks evaluated by an LLM.
Each guardrail specifies an application point (when to check), a check expression or prompt, and an action to take when the check fails.
GUARDRAILS:
  profanity_filter:
    kind: input
    check: not_contains_blocked_words(input)
    action: block
    message: "Your message was blocked. Please keep the conversation respectful."
    priority: 1

  pii_output_prevention:
    kind: output
    check: not_contains_ssn(response)
    action: redact
    message: "Sensitive information has been redacted."
    priority: 0

Application points

The kind property determines when the guardrail is evaluated during the agent’s processing pipeline.
KindEvaluation point
inputBefore the user’s message reaches the LLM.
outputAfter the LLM generates a response, before it is sent to the user.
bothEvaluated on both input and output.
tool_inputBefore parameters are sent to a tool call.
tool_outputAfter a tool returns its result, before the result enters the LLM context.
handoffBefore context is passed to another agent during a handoff.

Guardrail properties

PropertyTypeRequiredDefaultDescription
namestringYesUnique identifier for the guardrail (the YAML key).
kindstringYesApplication point. See Application points.
checkstringNoCEL expression to evaluate (Tier 1). Omit for model-based or LLM-based.
actionstringYesAction when the check fails. See Actions.
messagestringNoHuman-readable message displayed or logged when the guardrail triggers.
prioritynumberNo100Evaluation priority. Lower values are evaluated first.
providerstringNoModel provider name for Tier 2 checks (e.g., openai_moderation).
categorystringNoSafety taxonomy category for Tier 2 (e.g., hate, violence).
thresholdnumberNoScore threshold (0.0—1.0) for model-based checks.
llm_checkstringNoNatural language prompt for Tier 3 LLM-based checks.
severity_actionsobjectNoPer-severity action overrides. See Graduated actions.
fix_strategystringNoFix strategy when action: fix. See Fix strategies.
fix_expressionstringNoCEL expression for the custom fix strategy.
max_reasksnumberNo2Maximum reask attempts when action: reask.
filter_min_lengthnumberNoMinimum content length after filtering. Below this threshold, block instead.
streamingbooleanNofalseEnable mid-stream evaluation for streaming responses.
streaming_intervalstringNoStreaming evaluation granularity. See Streaming evaluation.

Actions

The action property determines the runtime behavior when a guardrail check fails.
ActionBehavior
blockReject the content entirely. For input, the user message is discarded. For output, the response is withheld.
warnAllow the content through but emit a warning event. The message is logged, not sent to the user.
redactReplace the offending content with a redaction marker and continue. The sanitized content is passed through.
escalateTrigger human escalation for review. The content is held pending human decision.
fixAutomatically repair the content using a fix strategy. See Fix strategies.
reaskReject the LLM output and re-prompt with the guardrail’s message appended as additional guidance.
filterRemove the offending portions while preserving the rest of the content.

Three-tier implementation

Tier 1: CEL-based checks

CEL (Common Expression Language) checks are fast, deterministic rules evaluated without calling an external model. Use the check property with a CEL expression.
GUARDRAILS:
  length_limit:
    kind: output
    check: length(response) < 10000
    action: warn
    message: "Response exceeds recommended length."

  ssn_detection:
    kind: input
    check: not_matches_pattern(input, "\\b\\d{3}-\\d{2}-\\d{4}\\b")
    action: redact
    message: "SSN detected and redacted."

Tier 2: Model-based checks

Model-based checks use a pre-trained classification model to score content. You specify a provider, an optional category, and a threshold.
GUARDRAILS:
  toxicity_detection:
    kind: input
    provider: openai_moderation
    category: hate
    threshold: 0.7
    action: block
    message: "Content flagged for hateful language."

Tier 3: LLM-based checks

LLM-based checks use a natural language prompt evaluated by an LLM. Use the llm_check property with a descriptive prompt.
GUARDRAILS:
  medical_advice_check:
    kind: output
    llm_check: "Does this response provide specific medical diagnoses or prescribe medication? Answer YES or NO."
    action: block
    message: "I'm not able to provide medical diagnoses. Please consult a healthcare professional."

Fix strategies

When action: fix, the fix_strategy property determines how content is repaired.
StrategyBehavior
truncateTruncate content to the maximum allowed length.
strip_htmlRemove HTML tags from the content.
redact_piiDetect and replace PII patterns with redaction markers.
normalizeNormalize whitespace, encoding, and special characters.
customApply a custom CEL expression defined in fix_expression.

Example: fix with truncation

GUARDRAILS:
  response_length:
    kind: output
    check: length(response) <= 5000
    action: fix
    fix_strategy: truncate
    message: "Response was trimmed to fit the maximum length."

Example: custom fix expression

GUARDRAILS:
  normalize_whitespace:
    kind: output
    check: not_contains_excessive_whitespace(response)
    action: fix
    fix_strategy: custom
    fix_expression: "collapse_whitespace(response)"

Graduated actions

Use severity_actions to apply different actions based on the severity of the violation. The keys are severity labels and the values are action names.
GUARDRAILS:
  content_safety:
    kind: output
    provider: openai_moderation
    threshold: 0.5
    action: warn
    severity_actions:
      low: warn
      medium: reask
      high: block
    message: "Content flagged by safety model."

Streaming evaluation

For streaming responses, guardrails can evaluate content as it is generated rather than waiting for the complete response. | Property | Values | Description | | -------------------- | --------------------------------- | ------------------------------------ | ----------------------------- | | streaming | true | false | Enable mid-stream evaluation. | | streaming_interval | token, sentence, chunk_size | Granularity of streaming evaluation. |
GUARDRAILS:
  realtime_safety:
    kind: output
    provider: openai_moderation
    threshold: 0.8
    action: block
    streaming: true
    streaming_interval: sentence
    message: "Response generation halted due to safety concern."
When a streaming guardrail triggers, the response generation is halted at the current point and the message is sent to the user.

Reask behavior

When action: reask, the runtime rejects the LLM output, appends the guardrail’s message as additional guidance, and re-prompts. The max_reasks property controls how many times this can happen before falling back to a block.
GUARDRAILS:
  factual_grounding:
    kind: output
    llm_check: "Does this response make claims not supported by the provided context?"
    action: reask
    max_reasks: 3
    message: "Stick to information from the provided context. Do not make unsupported claims."

Priority and evaluation order

Guardrails are evaluated in order of priority (lower values first). When multiple guardrails have the same priority, they are evaluated in declaration order. A block action from any guardrail stops further evaluation. warn actions do not stop evaluation; all subsequent guardrails continue to run.

Built-in guardrail templates

ABL provides five built-in guardrail templates that you can reference by convention:
TemplateKindCheckAction
account_number_maskingoutputFull account numbers in responseredact
credential_inputinputPasswords, PINs, security codesredact
ssn_protectioninputSSN patternsredact
profanity_filterinputBlocked words listblock
harmful_content_detectionbothHarmful instruction patternsescalate

Complete example

GUARDRAILS:
  account_number_masking:
    kind: output
    check: not_contains_full_account_number(response)
    action: redact
    message: "Account numbers are masked. Only the last 4 digits are displayed."
    priority: 0

  credential_input:
    kind: input
    check: not_contains_credentials(input)
    action: redact
    message: "Please never share passwords or PINs in this chat."
    priority: 0

  credit_card_detection:
    kind: input
    check: not_matches_pattern(input, "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b")
    action: redact
    message: "Credit card number redacted for your security."

  toxicity_check:
    kind: output
    check: toxicity_score(response) < 0.5
    action: block
    message: "Response blocked due to potential harmful content."
    priority: 1