Guardrails

Guardrails help detect unsafe, abusive, harmful, or non-compliant content in agent input and output interactions. Guardrails support:

Input and output safety enforcement
Prompt injection detection
PII protection and redaction
Provider-based moderation
Runtime enforcement controls
Streaming response protection
Centralized policy management

Depending on configuration, guardrails can block content, warn users, redact sensitive information, escalate interactions, request rephrasing, or automatically sanitize responses.

Typical Runtime Flow

┌────────────┐
│ User Input │
└──────┬─────┘
       ↓
┌────────────────────┐
│ Input Guardrails   │
│ • PII detection    │
│ • Prompt injection │
│ • Topic checks     │
└──────┬─────────────┘
       ↓
┌────────────────────┐
│ Agent / Model      │
│ Processing         │
└──────┬─────────────┘
       ↓
┌────────────────────┐
│ Output Guardrails  │
│ • Toxicity checks  │
│ • PII redaction    │
│ • Content filtering│
└──────┬─────────────┘
       ↓
┌────────────────────┐
│ Final Response     │
│ Returned to User   │
└────────────────────┘

Guardrail Configuration Levels

Guardrails can be configured at:

The project level using centralized guardrail policies
The agent level using agent-specific guardrails

Project Guardrails vs. Agent Guardrails

Project-level policies apply in addition to agent-specific guardrails.

Scope	Purpose	Typical usage
Project guardrails	Centralized governance and reusable safety policies	Enterprise-wide safety enforcement across agents
Agent guardrails	Agent-specific runtime safety checks	Localized rules for individual agents

Project Guardrails

Project guardrails are managed from: Govern > Guardrails. Project guardrails provide:

Reusable safety policies across agents
Centralized provider management
Runtime execution settings
Streaming response enforcement
Cross-agent governance controls

Use project guardrails when you want:

Consistent governance across multiple agents
Shared moderation providers
Organization-wide safety controls
Centralized runtime management

Agent Guardrails

Guardrails configured directly within an agent. Agent guardrails are managed from: Agent > Guardrails. Agent guardrails provide:

Agent-specific safety checks
Runtime rule configuration
Input/output rule behavior
Rule-level actions and messages

Use agent guardrails when:

Safety rules are specific to one agent
Runtime behavior must be customized locally
Shared project-level governance is not required

Understand DSL and UI mapping

The platform maintains a one-to-one mapping between the UI configuration and the DSL/ABL definition. This allows you to:

Configure guardrails visually
Manage guardrails as code
Version and compare configuration changes
Switch between UI and DSL-based editing workflows

For example, when you add a guardrail rule in the UI, the platform generates the corresponding GUARDRAILS: block in the DSL/ABL. Similarly, updating the GUARDRAILS: block directly in the DSL/ABL updates the same rule configuration in the UI. For detailed guardrail syntax, runtime semantics, and advanced ABL examples, see the Guardrails section in the ABL Reference Guide.

Policy Scopes

Guardrail policies can be applied at different scopes:

Project-Level Scope

Apply the policy to all agents in the project.

{
  "scopeType": "project"
}

Agent-Level Scope

Apply the policy only to a specific agent.

{
  "scopeType": "agent",
  "agentDefId": "agent-definition-id"
}

Guardrail Policies

Policies are reusable governance containers that define runtime safety behavior across agents and projects. Policies can be applied at:

Project level
Agent level

Go to: Govern> Guardrails> Policies. Policies contain one or more rules. Each rule defines:

What to evaluate
Where to evaluate it
Which provider to use
What action to take when triggered

Rules can support:

Input and output evaluation
Streaming responses
Pattern matching
Model-based moderation
LLM-based classification

Create a Guardrail Policy

Go to Govern > Guardrails.
On the Policies tab, click Create policy.
Enter Policy name and Description.
Select whether the policy applies to all the agents in the project or only to a specific agent.
Configure the required rules and runtime settings.
Save the policy.

Rules

Field	Description
Applies To	Select where the rule is evaluated: Input, Output, or Both.
Action	Select what happens when the rule is triggered, such as Block, Warn, Redact, Escalate, Fix, Reask, or Filter.
Provider	Select the provider used for guardrail evaluation.
Category	Define the safety or content category evaluated by the rule.
Severity Threshold	Set the threshold level used to trigger the configured action.
Action Message	Enter the message shown or logged when the rule is triggered.

Runtime Settings

Setting	Description
Fail Mode	Controls whether execution continues or is blocked if guardrail evaluation fails. Fail-open allows execution to continue if guardrail evaluation fails or times out. Fail-closed blocks execution when guardrail evaluation cannot be completed successfully. Use fail-closed behavior for high-security or compliance-sensitive applications.
Local Timeout	Defines how long the platform waits for local guardrail evaluation.
Model Timeout	Defines how long the platform waits for model-based provider evaluation.
LLM Timeout	Defines how long the platform waits for LLM-based evaluation.
Streaming Evaluation	Enables guardrail evaluation while responses are streamed.
Chunk Interval	Defines whether streamed responses are evaluated by sentence, token, or chunk size.
Early Termination	Stops evaluation on the first guardrail trigger.

Only one policy can be active per project at a time, and activating a new policy automatically deactivates the previously active policy.

Custom Guardrail Policies

Custom guardrail policies provide centralized, organization-wide safety enforcement across agents and projects. Policies support reusable rules, provider-based moderation, streaming evaluation, budget controls, and scoped runtime enforcement. Custom guardrail policies support:

Project-level and agent-level scopes
Streaming guardrails
Budget controls
Constitution principles
External moderation providers

When configured budgets are exceeded, guardrails can fall back to pattern-based checks. For API payloads, policy schemas, and advanced configuration examples, see the Guardrail Policy API Reference in the ABL Reference Guide.

Guardrail Providers

Providers are evaluation engines used to classify or inspect content during runtime. Providers can:

Detect unsafe content
Identify PII
Classify toxicity
Evaluate prompt injection attempts
Perform model-based moderation

Supported provider types include:

OpenAI Moderation
Azure AI Content Safety
Anthropic
Lakera Guard
Custom HTTP providers
Custom webhook providers
Built-in PII providers

Configure Providers

For advanced guardrail evaluation, such as toxicity scoring and content classification, connect external providers.

Go to Govern> Guardrails.
Open the Providers tab.
Click Add provider.
Configure the following fields and save the provider.

Field	Description
Adapter Type	Select the integration type used for guardrail evaluation, such as OpenAI Moderation, Custom HTTP, Custom Webhook, or Custom LLM.
Hosting	Select the provider hosting model, such as Cloud API, Self-Hosted, or Managed Service.
Endpoint URL	Enter the provider API endpoint URL.
Model	Enter or select the model used for guardrail evaluation.
Authentication	Enable and select an authentication profile for the provider connection. Raw API keys are not accepted. Use an Auth Profile for providers that require credentials.
Default Category	Define the default moderation or safety category evaluated by the provider.
Default Threshold	Define the default score threshold that triggers enforcement actions.
Circuit breaker	Configure provider failure handling settings: • Max Failures — Defines how many consecutive failures are allowed before the circuit breaker activates. • Reset Timeout — Defines how long the platform waits before retrying a disabled provider.
Retry Configuration	Configure retry behavior for temporary provider failures: • Max Retries — Defines how many retry attempts are made when provider evaluation fails. • Backoff Strategy — Configures the retry delay behavior between failed attempts.

Input Guardrails

Input guardrails evaluate user messages before they reach the LLM. Use input guardrails to detect unsafe content, identify prompt injection attempts, protect sensitive information, and enforce topic or policy restrictions. Use kind: input to evaluate user messages before they reach the LLM.

GUARDRAILS:
  profanity_filter:
    kind: input
    action: block

Input guardrails support:

Pattern-based detection
Provider-based moderation
LLM-based classification
Severity-based actions
Runtime priority ordering

For advanced syntax and additional examples, see the Guardrails section in the ABL Reference Guide.

Output Guardrails

Output guardrails evaluate generated responses before they are returned to the user. Use output guardrails to prevent unsafe responses, redact sensitive information, apply moderation checks, and inspect streaming output during generation. Use kind: output to evaluate generated responses before they are returned to the user.

GUARDRAILS:
  pii_output_prevention:
    kind: output
    action: block

Output guardrails support:

PII detection and redaction
Toxicity scoring
Streaming response evaluation
Bidirectional guardrails
Automatic response cleanup and fix strategies

Use kind: both to apply the same rule to both input and output.

GUARDRAILS:
  phone_number_check:
    kind: both
    action: warn

Streaming output guardrails can evaluate responses while content is still being generated.

GUARDRAILS:
  streaming_safety:
    kind: output
    streaming: true

For advanced syntax and additional examples, see the Guardrails section in the ABL Reference Guide.

Best Practices

Use project guardrails for centralized governance.
Use agent guardrails for localized runtime behavior.
Start with warn before enabling block.
Test regex patterns carefully to reduce false positives.
Enable streaming guardrails for high-risk applications.
Use fail-closed behavior for compliance-sensitive workloads.
Separate business constraints from safety guardrails.
Use providers with caching and budget controls for large-scale deployments.

Building Agents

Platform Services

Administration

Analytics

Typical Runtime Flow

Guardrail Configuration Levels

Project Guardrails vs. Agent Guardrails

Project Guardrails

Agent Guardrails

Understand DSL and UI mapping

Policy Scopes

Project-Level Scope

Agent-Level Scope

Guardrail Policies

Create a Guardrail Policy

Rules

Runtime Settings

Custom Guardrail Policies

Guardrail Providers

Configure Providers

Input Guardrails

Output Guardrails

Best Practices

Building Agents

Platform Services

Administration

Analytics

Documentation Index

​Typical Runtime Flow

​Guardrail Configuration Levels

​Project Guardrails vs. Agent Guardrails

​Project Guardrails

​Agent Guardrails

​Understand DSL and UI mapping

​Policy Scopes

​Project-Level Scope

​Agent-Level Scope

​Guardrail Policies

​Create a Guardrail Policy

​Rules

​Runtime Settings

​Custom Guardrail Policies

​Guardrail Providers

​Configure Providers

​Input Guardrails

​Output Guardrails

​Best Practices

Typical Runtime Flow

Guardrail Configuration Levels

Project Guardrails vs. Agent Guardrails

Project Guardrails

Agent Guardrails

Understand DSL and UI mapping

Policy Scopes

Project-Level Scope

Agent-Level Scope

Guardrail Policies

Create a Guardrail Policy

Rules

Runtime Settings

Custom Guardrail Policies

Guardrail Providers

Configure Providers

Input Guardrails

Output Guardrails

Best Practices