Guardrails help detect unsafe, abusive, harmful, or non-compliant content in agent input and output interactions. Guardrails support:Documentation Index
Fetch the complete documentation index at: https://koreai-v2-home-nav.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
- Input and output safety enforcement
- Prompt injection detection
- PII protection and redaction
- Provider-based moderation
- Runtime enforcement controls
- Streaming response protection
- Centralized policy management
Typical Runtime Flow
Guardrail Configuration Levels
Guardrails can be configured at:- The project level using centralized guardrail policies
- The agent level using agent-specific guardrails
Project Guardrails vs. Agent Guardrails
Project-level policies apply in addition to agent-specific guardrails.| Scope | Purpose | Typical usage |
|---|---|---|
| Project guardrails | Centralized governance and reusable safety policies | Enterprise-wide safety enforcement across agents |
| Agent guardrails | Agent-specific runtime safety checks | Localized rules for individual agents |
Project Guardrails
Project guardrails are managed from: Govern > Guardrails. Project guardrails provide:- Reusable safety policies across agents
- Centralized provider management
- Runtime execution settings
- Streaming response enforcement
- Cross-agent governance controls
- Consistent governance across multiple agents
- Shared moderation providers
- Organization-wide safety controls
- Centralized runtime management
Agent Guardrails
Guardrails configured directly within an agent. Agent guardrails are managed from: Agent > Guardrails. Agent guardrails provide:- Agent-specific safety checks
- Runtime rule configuration
- Input/output rule behavior
- Rule-level actions and messages
- Safety rules are specific to one agent
- Runtime behavior must be customized locally
- Shared project-level governance is not required
Understand DSL and UI mapping
The platform maintains a one-to-one mapping between the UI configuration and the DSL/ABL definition. This allows you to:- Configure guardrails visually
- Manage guardrails as code
- Version and compare configuration changes
- Switch between UI and DSL-based editing workflows
GUARDRAILS: block in the DSL/ABL. Similarly, updating the GUARDRAILS: block directly in the DSL/ABL updates the same rule configuration in the UI.
For detailed guardrail syntax, runtime semantics, and advanced ABL examples, see the Guardrails section in the ABL Reference Guide.
Policy Scopes
Guardrail policies can be applied at different scopes:Project-Level Scope
Apply the policy to all agents in the project.Agent-Level Scope
Apply the policy only to a specific agent.Guardrail Policies
Policies are reusable governance containers that define runtime safety behavior across agents and projects. Policies can be applied at:- Project level
- Agent level
- What to evaluate
- Where to evaluate it
- Which provider to use
- What action to take when triggered
- Input and output evaluation
- Streaming responses
- Pattern matching
- Model-based moderation
- LLM-based classification
Create a Guardrail Policy
- Go to Govern > Guardrails.
- On the Policies tab, click Create policy.
- Enter Policy name and Description.
- Select whether the policy applies to all the agents in the project or only to a specific agent.
- Configure the required rules and runtime settings.
- Save the policy.
Rules
| Field | Description |
|---|---|
| Applies To | Select where the rule is evaluated: Input, Output, or Both. |
| Action | Select what happens when the rule is triggered, such as Block, Warn, Redact, Escalate, Fix, Reask, or Filter. |
| Provider | Select the provider used for guardrail evaluation. |
| Category | Define the safety or content category evaluated by the rule. |
| Severity Threshold | Set the threshold level used to trigger the configured action. |
| Action Message | Enter the message shown or logged when the rule is triggered. |
Runtime Settings
| Setting | Description |
|---|---|
| Fail Mode | Controls whether execution continues or is blocked if guardrail evaluation fails. Fail-open allows execution to continue if guardrail evaluation fails or times out. Fail-closed blocks execution when guardrail evaluation cannot be completed successfully. Use fail-closed behavior for high-security or compliance-sensitive applications. |
| Local Timeout | Defines how long the platform waits for local guardrail evaluation. |
| Model Timeout | Defines how long the platform waits for model-based provider evaluation. |
| LLM Timeout | Defines how long the platform waits for LLM-based evaluation. |
| Streaming Evaluation | Enables guardrail evaluation while responses are streamed. |
| Chunk Interval | Defines whether streamed responses are evaluated by sentence, token, or chunk size. |
| Early Termination | Stops evaluation on the first guardrail trigger. |
Only one policy can be active per project at a time, and activating a new policy automatically deactivates the previously active policy.
Custom Guardrail Policies
Custom guardrail policies provide centralized, organization-wide safety enforcement across agents and projects. Policies support reusable rules, provider-based moderation, streaming evaluation, budget controls, and scoped runtime enforcement. Custom guardrail policies support:- Project-level and agent-level scopes
- Streaming guardrails
- Budget controls
- Constitution principles
- External moderation providers
Guardrail Providers
Providers are evaluation engines used to classify or inspect content during runtime. Providers can:- Detect unsafe content
- Identify PII
- Classify toxicity
- Evaluate prompt injection attempts
- Perform model-based moderation
- OpenAI Moderation
- Azure AI Content Safety
- Anthropic
- Lakera Guard
- Custom HTTP providers
- Custom webhook providers
- Built-in PII providers
Configure Providers
For advanced guardrail evaluation, such as toxicity scoring and content classification, connect external providers.- Go to Govern> Guardrails.
- Open the Providers tab.
- Click Add provider.
- Configure the following fields and save the provider.
| Field | Description |
|---|---|
| Adapter Type | Select the integration type used for guardrail evaluation, such as OpenAI Moderation, Custom HTTP, Custom Webhook, or Custom LLM. |
| Hosting | Select the provider hosting model, such as Cloud API, Self-Hosted, or Managed Service. |
| Endpoint URL | Enter the provider API endpoint URL. |
| Model | Enter or select the model used for guardrail evaluation. |
| Authentication | Enable and select an authentication profile for the provider connection. Raw API keys are not accepted. Use an Auth Profile for providers that require credentials. |
| Default Category | Define the default moderation or safety category evaluated by the provider. |
| Default Threshold | Define the default score threshold that triggers enforcement actions. |
| Circuit breaker | Configure provider failure handling settings: • Max Failures — Defines how many consecutive failures are allowed before the circuit breaker activates. • Reset Timeout — Defines how long the platform waits before retrying a disabled provider. |
| Retry Configuration | Configure retry behavior for temporary provider failures: • Max Retries — Defines how many retry attempts are made when provider evaluation fails. • Backoff Strategy — Configures the retry delay behavior between failed attempts. |
Input Guardrails
Input guardrails evaluate user messages before they reach the LLM. Use input guardrails to detect unsafe content, identify prompt injection attempts, protect sensitive information, and enforce topic or policy restrictions. Usekind: input to evaluate user messages before they reach the LLM.
- Pattern-based detection
- Provider-based moderation
- LLM-based classification
- Severity-based actions
- Runtime priority ordering
Output Guardrails
Output guardrails evaluate generated responses before they are returned to the user. Use output guardrails to prevent unsafe responses, redact sensitive information, apply moderation checks, and inspect streaming output during generation. Usekind: output to evaluate generated responses before they are returned to the user.
- PII detection and redaction
- Toxicity scoring
- Streaming response evaluation
- Bidirectional guardrails
- Automatic response cleanup and fix strategies
kind: both to apply the same rule to both input and output.
Best Practices
- Use project guardrails for centralized governance.
- Use agent guardrails for localized runtime behavior.
- Start with
warnbefore enablingblock. - Test regex patterns carefully to reduce false positives.
- Enable streaming guardrails for high-risk applications.
- Use fail-closed behavior for compliance-sensitive workloads.
- Separate business constraints from safety guardrails.
- Use providers with caching and budget controls for large-scale deployments.