Skip to main content

Documentation Index

Fetch the complete documentation index at: https://koreai-v2-home-nav.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Last Updated: 2026-03-16 Version: v2026.03.0

1. ABL Language

The Agent Blueprint Language — a domain-specific language for defining agent behavior, conversation flows, and tool orchestration.

1.1 Core Constructs

FeatureDescription
AGENT declarationDefine an agent’s identity including its name, persona description, domain expertise, and behavioral limitations. The declaration serves as the root element of every ABL file and establishes the agent’s role within a multi-agent topology. Supports inheritance of behavior profiles for consistent agent families.
STEP definitionsNamed execution steps for flow-based (scripted) agents, each containing RESPOND, CALL, SET, or GATHER instructions with explicit transition targets. Steps support WHEN condition branching via CEL expressions to create deterministic conversation paths. Flow agents execute steps in sequence or branch based on evaluated conditions, giving designers full control over conversation structure.
GATHERStructured information collection that defines typed fields (text, number, date, email, phone, etc.) with per-field validation rules, custom prompts, extraction hints, and configurable retry logic. The runtime uses NLU entity extraction to fill fields from free-form user input and tracks completion state per field. Supports confirmation flows, lenient/strict validation styles, and inference-based extraction for complex entities.
RESPONDGenerate agent output messages using template interpolation with {{variable}} syntax and rich content variants (Markdown, Adaptive Cards, HTML, Slack, WhatsApp, carousel). RESPOND also supports SSML for voice agents and per-channel formatting via the RichContentIR schema. This enables a single agent definition to render appropriately across web, mobile, voice, and messaging channels.
CALLInvoke tools and functions with named parameter binding from session state or expressions, capturing results into variables for downstream use. CALL supports all tool types (HTTP, MCP, sandbox, connector, lambda) and applies the full middleware chain including audit logging, PII scrubbing, and result validation. Tools can require user confirmation gates before execution for sensitive operations.
SETAssign session-scoped variables from literal values, CEL expressions, or tool call results. SET instructions execute atomically within the session state store, ensuring consistent variable values across concurrent evaluation. Variables set here are available to all downstream steps, RESPOND templates, and CONSTRAINT conditions within the same session.
HANDOFFTransfer conversation control from the current agent to a named target agent with full context passing. The handoff executor validates against self-handoff, cycle detection (via a handoff stack), and ensures the target exists in the agent registry. Supports conditional WHEN clauses and tracks return paths via HandoffSessionInfo so the conversation can return to the parent agent after resolution.
DELEGATEInvoke a child agent as a sub-routine, passing mapped input and waiting for its result before continuing execution. Delegation enforces a maximum nesting depth of 10 (MAX_DELEGATE_DEPTH) to prevent runaway recursion, supports WHEN conditions, per-delegation timeouts with on_failure fallbacks, and maps child output back to parent variables via the returns configuration. Unlike HANDOFF, control always returns to the delegating agent.
CONSTRAINTConditional logic enforcement using CEL expressions evaluated against session state, with configurable on-fail actions (block, warn, escalate, fallback). Constraints can be applied at the agent level (global), per-step, or as guardrail phases (pre-tool, post-response). The dual evaluator supports both CEL and legacy expression syntax with automatic migration, ensuring backward compatibility while enforcing business rules and compliance policies.
LOOKUPReference data retrieval from static lookup tables defined in the ABL file or from external data sources. Lookup tables support key-value, range-based, and pattern-matching lookups compiled into the IR at build time. This enables agents to validate fields against controlled vocabularies, resolve codes to display names, and access reference data without runtime API calls.
REMEMBER / RECALLPersistent memory operations that write (REMEMBER) and read (RECALL) facts across sessions using the FactStore. Facts are stored with namespaced keys (user., system., agent.*), source attribution, and configurable TTLs. RECALL retrieves previously stored facts by key or prefix pattern, enabling agents to maintain long-term context about users and conversations beyond the current session boundary.
ON_FAILUREError recovery handlers that can be defined per-step or globally at the agent level. Each handler specifies a strategy (retry with backoff, escalate to human, transition to fallback step, or respond with a safe message) and captures error context for observability. Global ON_FAILURE acts as a catch-all for unhandled exceptions, ensuring agents degrade gracefully rather than leaving conversations in a broken state.
SUB_INTENTMulti-intent detection within a single user utterance, enabling the agent to queue and process secondary intents after resolving the primary one. The NLU sub-intent detector decomposes complex messages (e.g., “book a flight and reserve a hotel”) into individual intent signals with confidence scores. Queued intents are processed in priority order with a configurable max age to prevent stale intent handling.
DIGRESSIONOff-topic detection and conversation flow recovery that identifies when a user’s message deviates from the current gathering or step context. The digression detector uses LLM-based semantic analysis to distinguish genuine topic changes from clarifying questions. When a digression is detected, the agent can acknowledge it, briefly address the tangent, and smoothly return to the original conversation flow.
ACTION_HANDLERRich interactive responses including carousels, buttons, select dropdowns, and input fields defined via the ActionSetIR schema. Each action element has an ID, type, label, and optional validation, and ACTION_HANDLER blocks define response logic per user action (e.g., button click triggers a SET + transition). This enables agents to present structured UI beyond plain text, with channel-appropriate rendering for web widgets, Slack, WhatsApp, and Adaptive Cards.
REASONING zoneLLM-driven iterative reasoning block with configurable max iterations, tool dispatch, and goal completion detection. The reasoning executor sends the conversation context to the LLM, which can call tools via function calling, evaluate intermediate results, and decide whether to continue reasoning or produce a final answer. The __fan_out__ system tool enables parallel sub-agent execution within a reasoning cycle for complex orchestration tasks.

1.2 Parser & Compilation

FeatureDescription
ABL DSL parserCustom lexer and parser that reads .abl files containing agent definitions, steps, tools, and coordination blocks. The parser produces an AST that is then compiled into AgentIR. It supports all ABL constructs including AGENT, STEP, GATHER, RESPOND, CALL, CONSTRAINT, HANDOFF, and DELEGATE with detailed source-location tracking for error reporting.
YAML flow parserAlternative YAML-based flow definition format for teams that prefer declarative configuration over DSL syntax. YAML flows are parsed into the same internal AST as ABL files and compile to identical AgentIR output. This enables dual-format support where agents can be authored in either ABL or YAML without runtime differences.
Supervisor parserParses multi-agent orchestration topology definitions that declare which child agents a supervisor can route to and under what conditions. The parser validates agent references, routing rules, and intent mappings before compilation. Supervisor topologies can define both intent-based and context-based routing strategies within a single declaration.
Tool file parserParses tool definition files that declare HTTP endpoints, MCP servers, sandbox code, connector actions, and lambda functions. Each tool definition includes parameter schemas (JSON Schema), authentication bindings, timeout configuration, and optional confirmation gates. The parser validates tool schemas and resolves auth profile references at compile time.
Expression parserEvaluates CEL (Common Expression Language) expressions and legacy custom expressions used in WHEN conditions, CONSTRAINT rules, SET assignments, and template interpolation. The dual evaluator supports both syntaxes simultaneously with automatic migration hints, ensuring backward compatibility while encouraging adoption of the standard CEL syntax.
IR compilationCompiles parsed ABL/YAML source into AgentIR (intermediate representation), a framework-agnostic JSON schema consumed by all runtimes (digital, voice, workflow). The IR includes identity, execution config, tools, gather fields, memory settings, constraints, coordination, flow definitions, and behavior profiles. A source hash enables change detection for incremental recompilation.
Validation pipelineCross-reference validation that verifies all tool references resolve to defined tools, all HANDOFF/DELEGATE targets reference known agents, all step transitions point to valid steps, and all variable references in expressions are declared. Validation runs after parsing but before IR emission, catching broken references and type mismatches before deployment.
Parse warningsNon-blocking warnings emitted during compilation for best-practice violations that do not prevent execution but may indicate design issues. Examples include missing REASONING declarations on reasoning agents, unused tool definitions, unreachable steps, and deprecated syntax patterns. Warnings appear in the Studio editor as yellow annotations without blocking deployment.

2. Agent Anatomy

2.1 Agent Types

FeatureDescription
Reasoning agentsLLM-driven agents that use iterative reasoning with tool dispatch to solve open-ended tasks. The reasoning executor loops through LLM calls, tool executions, and intermediate evaluations up to a configurable max iteration limit. Goal completion is detected automatically when the LLM produces a final answer without requesting additional tool calls, or when an explicit completion condition is met.
Flow agentsDeterministic step-based agents that execute a predefined sequence of steps with explicit transition targets and WHEN condition branching via CEL expressions. Flow agents guarantee predictable conversation paths and are ideal for structured processes like onboarding, form filling, or compliance workflows. Each step can contain RESPOND, CALL, SET, GATHER, and CONSTRAINT instructions executed in declaration order.
Supervisor agentsMulti-agent orchestration agents that route conversations between child agents based on detected intent, conversation context, or explicit routing rules. Supervisors maintain a topology of available agents and use either LLM-based or rule-based routing to select the appropriate child agent. They manage handoff stacks, context merging, and return-to-parent routing to maintain conversational continuity across agent boundaries.
Voice agentsReal-time WebSocket-based agents that process audio input via STT (Deepgram) and produce audio output via TTS (ElevenLabs) or native realtime LLM voice (OpenAI Realtime, Gemini Live, Ultravox). Voice agents support barge-in detection, connection pre-warming for low latency, and SSML-annotated responses. They operate on the same AgentIR as digital agents but with voice-specific configuration overlays for audio format, VAD settings, and prosody.
Digital agentsText-based conversational agents designed for web chat, mobile, email, SMS, WhatsApp, Slack, and HTTP async channels. Digital agents render responses with channel-appropriate formatting (Markdown, Adaptive Cards, HTML, carousel) and support rich interactive elements like buttons and input forms. They share the same compilation and runtime pipeline as voice agents, enabling a single agent definition to serve multiple channels simultaneously.
Workflow agentsStateful long-running workflow agents integrated with BullMQ job queues and Restate durable execution for multi-step processes that span hours or days. Workflow agents persist their state between steps, support retries with exponential backoff, and can be triggered as tool calls from other agents. This enables complex business processes like approval chains, data pipeline orchestration, and scheduled batch operations to be modeled as agent workflows.

2.2 Agent Structure

FeatureDescription
AgentIR schemaComplete intermediate representation containing identity metadata, execution config, tool definitions, gather fields, memory settings, constraints, coordination (handoffs/delegates), flow steps, routing rules, behavior profiles, and NLU configuration. The IR schema is versioned (currently v1.0), framework-agnostic, and consumed identically by digital, voice, and workflow runtimes. A source hash on each compiled IR enables change detection for incremental redeployment.
Identity & contractAgent inputs, outputs, and signal definitions that form the agent’s public contract. Inputs define what parameters the agent expects when invoked (e.g., via delegation), outputs define what it returns, and signals define events the agent can emit (e.g., escalation, completion). This contract enables type-safe agent composition in multi-agent topologies where parent agents depend on child agent interfaces.
Behavior profilesComposable, context-dependent behavior overlays that modify an agent’s instructions, voice settings, response rules, constraints, tool availability, and gather fields based on CEL conditions evaluated at runtime. Each profile has a priority for conflict resolution (higher wins) and supports tools_hide/tools_add, flow_modifications (skip/override/insert steps), and gather field overrides. This enables a single agent definition to adapt its behavior for different channels, user tiers, or business contexts.
Entry point resolutionHandler-based entry that maps incoming requests to the appropriate agent execution context. The runtime resolves the agent by name and version, loads the compiled IR, initializes session state, and wires up LLM clients, tool executors, and trace contexts. Entry point resolution supports both new session creation and session resumption, with channel-specific artifact resolution for returning users.
Completion conditionsConfigurable conditions that determine when an agent considers its task complete, including self-terminating (LLM signals done), escalation signals (agent explicitly escalates to human), and max-turn limits (safety cap on conversation length). Completion triggers session finalization, trace span closure, and optional parent notification when the agent was invoked via delegation. This prevents runaway conversations and ensures resource cleanup.
Error handlingPer-step and global ON_FAILURE handlers that define recovery strategies when steps, tool calls, or LLM invocations fail. Each handler specifies an action (retry with configurable backoff, escalate to human, transition to a fallback step, or respond with a safe error message) and captures error context for trace events. Global handlers act as catch-all safety nets, ensuring the agent never leaves a conversation in an unrecoverable state.
Agent versioningSemantic version tracking with history and rollback capability for deployed agents. Each compilation produces a versioned IR artifact linked to its source hash, enabling comparison between versions and instant rollback to a previous known-good state. Version history is queryable through the Studio UI and API, supporting audit trails and gradual rollout strategies across environments.

3. Multi-Agent Orchestration

3.1 Handoff

FeatureDescription
Agent-to-agent handoffTransfer conversation control from the current agent to a named target agent while preserving the full conversation context. The handoff executor validates the target exists in the registry, checks it is in the allowed handoff targets list, and passes context variables and conversation history to the new agent. The user experiences a seamless transition without needing to repeat information.
Self-handoff preventionThe HandoffExecutor detects when an agent attempts to hand off to itself (same agent name as the current thread) and blocks the operation with a descriptive error. This prevents accidental infinite loops in supervisor routing configurations where a routing rule might inadvertently point back to the same agent, which would otherwise consume resources without making progress.
Cycle detectionTracks a handoff stack (an ordered list of agent names visited in the current conversation path) to detect A-to-B-to-A circular patterns before they execute. When a cycle is detected, the handoff is blocked and an error is returned to the calling agent, which can then use its ON_FAILURE handler to gracefully recover. This prevents infinite loops in complex multi-agent topologies where transitive handoff chains could form cycles.
Context mergingWhen a handoff occurs, the parent agent’s session state (variables, gather progress, conversation history) is merged into the target agent’s execution context. The merge strategy preserves the target agent’s own defaults while overlaying parent-provided values, ensuring the receiving agent has all the context it needs without losing its own configuration. This enables information continuity across agent boundaries.
Return-to-parent routingHandoffSessionInfo tracks return paths so that after a child agent completes its task, control can be routed back to the parent agent that initiated the handoff. The handoffReturnInfo map records which target agents expect a return, and the runtime uses this to pop the handoff stack when the child signals completion. This enables hierarchical conversation patterns where specialized agents handle sub-tasks and then return to the orchestrating supervisor.
Conditional handoffWHEN conditions (CEL expressions evaluated against session state) are checked before a handoff executes. If the condition evaluates to false, the handoff is skipped and the agent continues with its current flow. This enables dynamic routing where handoffs only occur under specific business conditions (e.g., hand off to billing agent only when the user mentions payment issues), making multi-agent topologies context-aware.

3.2 Delegation

FeatureDescription
Sub-agent delegationInvoke a child agent as a synchronous sub-routine, passing structured input and waiting for its result before the parent agent continues execution. Unlike handoff (which transfers control), delegation always returns control to the parent. The delegate executor validates the target, evaluates WHEN conditions, maps input, and enforces depth limits before spawning the child agent’s execution context.
Max depth enforcementThe delegate executor enforces a hard limit of MAX_DELEGATE_DEPTH=10 on nested delegation chains to prevent runaway recursion. Each delegation increments the depth counter tracked on the delegateStack, and any attempt to exceed the limit is rejected with a descriptive error. This protects against accidental infinite nesting in complex agent hierarchies where agents delegate to each other transitively.
Input mappingTransform parent agent session state into structured child agent input using expression-based field mappings (input: { field: expression }). Each expression is evaluated against the parent’s data values, and fields that resolve to undefined are dropped with a warning rather than passed as null. This ensures child agents receive clean, typed input without requiring the parent to know the child’s internal state structure.
Return value extractionMap child agent output back to parent variables using path-based extraction rules (returns: { parentVar: childOutputPath }). After the delegated agent completes, its output is traversed using the configured paths and the extracted values are SET into the parent’s session state. This enables data flow between agent boundaries while keeping agents loosely coupled through declared interfaces rather than shared state.
Timeout enforcementEach delegation can specify a per-call timeout that limits how long the parent agent waits for the child to complete. If the timeout elapses, the delegation is aborted and the configured on_failure action executes (e.g., transition to a fallback step, respond with an error message, or escalate). This prevents slow or stuck child agents from blocking the parent conversation indefinitely.
Nested delegationAgents can delegate to child agents that themselves delegate to further children, forming recursive delegation hierarchies. The delegateStack tracks the full chain of active delegations, enabling cycle detection and depth enforcement at every level. Nested delegation enables modular agent architectures where specialized micro-agents can be composed into complex workflows without flattening the hierarchy.

3.3 Supervisor & Routing

FeatureDescription
Supervisor topologyDefine which child agents a supervisor can route conversations to, along with the routing rules that govern selection. The topology is declared in the ABL/YAML source and compiled into the IR’s routing configuration with available_agents list. Supervisors can only route to agents explicitly listed in their topology, providing a security boundary that prevents unauthorized agent access.
Intent-based routingRoute conversations to the appropriate child agent based on the primary intent detected by the NLU pipeline. The supervisor’s routing rules map intent names to target agents with confidence thresholds, and the routing executor selects the best match. When no rule matches with sufficient confidence, the supervisor can fall back to LLM-based reasoning or a default agent.
Context-based routingRoute conversations based on session state variables, conversation history, and context values using CEL expression conditions on routing rules. This enables sophisticated routing logic such as directing VIP customers to a premium support agent, routing based on language preference, or escalating based on sentiment scores. Context-based and intent-based routing can be combined for nuanced multi-factor routing decisions.
Agent registryA runtime registry that discovers and resolves agents by name and version, loading their compiled IR and wiring up execution contexts. The registry supports both local agents (deployed within the same runtime) and remote agents (accessible via A2A protocol). It provides agent card metadata (capabilities, supported channels, input/output schemas) for supervisor routing decisions and A2A discovery.
Remote agent executionCross-service agent invocation via the A2A (Agent-to-Agent) protocol, enabling agents deployed on different runtime instances to communicate over HTTP. The routing executor transparently handles remote handoffs and delegations through the A2A client, making the agent topology location-agnostic. This enables distributed agent architectures where specialized agents run on dedicated infrastructure.

3.4 A2A Protocol

FeatureDescription
Agent-to-Agent communicationHTTP-based inter-agent messaging implemented via the @a2a-js/sdk, wrapped with platform concerns (tracing, tenant isolation, SSRF protection). Supports synchronous sendTask, asynchronous sendTaskAsync, and streaming sendTaskStreaming modes. The protocol includes agent card discovery, task lifecycle management, and artifact exchange between agents running on different services or even different organizations.
Authenticated client factorySecure A2A client creation via createA2AClientWithAuth that injects authentication headers (JWT, API key, or custom auth) into every outbound A2A request. The OutboundAuthConfig supports multiple authentication schemes, ensuring cross-service agent calls are properly authenticated. This enables agents to communicate across trust boundaries without exposing internal credentials in the A2A message payload.
Task store (Redis-backed)RedisA2ATaskStore tracks the full lifecycle of cross-agent tasks including creation, status updates, artifact attachment, and completion. Tasks are stored in Redis with tenant-scoped keys, supporting list/filter operations and TTL-based cleanup. The lazy task store defers Redis connection until first use, reducing startup latency for runtimes that may not use A2A features.
Push notificationsReal-time A2A event delivery via the PushNotificationDeliveryService that notifies remote agents of task status changes and artifact updates. Push notifications are delivered as HTTP callbacks to registered endpoints, enabling event-driven agent workflows where agents react to remote task completions without polling. The callback router handles incoming notifications and dispatches them to the appropriate local agent execution context.
SSRF protectionThe SsrfEndpointValidator prevents internal network scanning by validating all outbound A2A endpoint URLs against a blocklist of private IP ranges (10.x, 172.16-31.x, 192.168.x, localhost, link-local). This protects the platform from Server-Side Request Forgery attacks where a malicious agent card could point to internal infrastructure. All A2A client calls pass through this validator before any HTTP request is made.

3.5 Parallel Execution

FeatureDescription
Fan-out system toolThe __fan_out__ built-in system tool enables concurrent agent execution within a reasoning cycle by spawning multiple child agent invocations in parallel. The reasoning executor recognizes this special tool call and dispatches each branch simultaneously, collecting results as they complete. This enables patterns like parallel research (query multiple knowledge bases at once) and competitive evaluation (run multiple agents and pick the best answer).
Result aggregationMerge the outputs of parallel agent branches into a unified result set that the parent reasoning agent can evaluate. Each branch’s output is tagged with its source agent name and completion status, enabling the parent to select, combine, or compare results. The aggregation preserves partial results from successful branches even when other branches fail, maximizing the utility of parallel execution.
Promise.allSettled semanticsParallel branches use Promise.allSettled semantics, meaning the fan-out continues and collects results even if some branches fail, timeout, or are rejected. Failed branches return their error information alongside successful results, letting the parent agent make informed decisions about how to proceed. This resilient approach ensures that a single failing sub-agent does not block the entire parallel operation or lose results from successful branches.

4. Memory Management

4.1 Session Memory

FeatureDescription
Conversation storeManages the complete session lifecycle including creation (with channel, agent, tenant, and contact metadata), resumption via session ID or channel artifact, querying by various filters, and graceful close with disposition tracking. The ConversationStore supports MongoDB and in-memory backends, captures all session interactions including failed or abandoned voice calls, and enforces tenant isolation on every query.
Session state persistenceAll session state (variables, gather progress, handoff stacks, delegate stacks) is persisted to MongoDB with atomic updates to prevent race conditions in concurrent access scenarios. State updates use MongoDB’s atomic operators to ensure consistency even when multiple runtime pods process messages for the same session simultaneously. Session TTLs enable automatic cleanup of abandoned sessions.
Sliding window contextA configurable message window limits the number of recent messages sent to the LLM for context, preventing token budget overflow in long conversations. The window size is set per agent or project and slides forward as new messages arrive, keeping only the most recent N messages. Older messages outside the window are still persisted in the message store for history and analytics but are excluded from LLM prompts.
Session compactionSummarizes older conversation messages that fall outside the sliding window into a concise summary, preserving key context while dramatically reducing token usage. The compaction process uses an LLM to generate a paragraph-level summary of the dropped messages, which is prepended to the context window as a system message. This enables long-running conversations to maintain coherence without hitting token limits.
Session variablesSET and GET operations for session-scoped variables that persist within a conversation and are accessible to all agents in a multi-agent topology. Variables can be set from literal values, CEL expressions, tool call results, or gather field extractions. They are stored as key-value pairs in the session state document and are available for template interpolation, condition evaluation, and tool parameter binding throughout the session lifetime.

4.2 Persistent Memory

FeatureDescription
Fact storeLong-lived persistent fact storage that spans across sessions, backed by Redis, MongoDB, ClickHouse, or PostgreSQL depending on deployment configuration. Facts are stored with namespaced keys (user., system., agent.*), source attribution (which agent/session/user created them), and optional TTLs. The store supports batch operations, prefix-based querying, and environment-scoped namespacing to isolate development from production data.
REMEMBER triggerDeclarative rules defined in the ABL source that specify when facts should be stored during agent execution. REMEMBER triggers evaluate conditions against session state and automatically persist matching data to the fact store with the appropriate namespace and source attribution. This enables agents to build up long-term knowledge about users (preferences, past issues, account details) without requiring explicit tool calls or custom code.
RECALL instructionRetrieve previously stored facts by key or key prefix from the fact store, making them available in the current session context. RECALL can be used in agent steps to load user preferences, past interaction summaries, or system-wide reference data before processing the current request. Retrieved facts are injected into the session state where they can be used in template interpolation, condition evaluation, and tool parameter binding.
Fact TTL and evictionTime-based expiration configured per fact entry via the expiresAt field, with a configurable default TTL on the store. Facts that exceed their TTL are automatically purged during periodic cleanup cycles. When the store approaches capacity limits, LRU (Least Recently Used) eviction removes the oldest-accessed facts first, ensuring the store stays within memory bounds while retaining the most valuable information.
Fact compactionArchive and compact stale facts by merging related entries and removing redundant or superseded information. Compaction runs as a background process that identifies fact clusters (e.g., multiple updates to user.preferences) and consolidates them into a single current-state fact. This reduces storage overhead and query latency for fact stores that accumulate large volumes of incremental updates over time.

4.3 Contact Memory

FeatureDescription
Contact storePersistent user/contact profile management supporting employees, customers, and anonymous visitors across all sessions. The ContactStore provides create, query, update, and soft-delete operations with tenant isolation, identity-based lookup (email, phone, device ID), and tag-based filtering. Contacts are stored in MongoDB or PostgreSQL with full audit trail, enabling agents to maintain long-term relationships with users across channels and sessions.
Contact attributesCustom key-value metadata storage per contact record, including displayName, department, employeeId, company, accountRef, channel, and arbitrary metadata fields. These attributes persist across sessions and are accessible to agents via RECALL or session context injection. This enables personalized agent interactions based on stored user profiles without requiring external CRM lookups for every conversation.
Contact linkingAsynchronous association of sessions with contact identity, supporting the pattern where a session starts anonymous and is linked to a known contact later via identity resolution. The linkContact operation uses SHA-256 hashed channel artifacts (caller_id, cookie, device_id) with identity tiers (0=anonymous, 1=unverified, 2=verified) to progressively establish user identity. This enables cross-session continuity even when users switch channels or devices.

4.4 Message Store

FeatureDescription
Message persistenceStore all conversation messages with rich metadata including role, channel, trace ID, contact ID, PII detection flags, and idempotency keys. The MessageStore supports MongoDB, ClickHouse (production columnar storage for high-volume analytics), and in-memory backends. Messages are stored independently of session lifecycle, enabling historical analysis and replay even after sessions are closed or archived.
Message retention policiesPer-tenant TTL-based message cleanup that automatically purges messages older than the configured retention period. TTLs are set at the MessageStore level via messageTtlMs and enforced by the storage backend’s native expiration mechanisms. This ensures compliance with data retention regulations and prevents unbounded storage growth, while allowing tenants with different compliance requirements to configure appropriate retention windows.
Message scrubbing (GDPR)Right-to-erasure implementation that can delete or anonymize all messages associated with a specific contact ID or session ID, satisfying GDPR Article 17 requirements. The scrubbing operation cascades through both the primary message store and any secondary stores (ClickHouse analytics, trace events), ensuring complete data removal. Scrub operations are logged to the audit trail for compliance verification.
Cursor paginationEfficient message retrieval using cursor-based pagination that avoids the performance degradation of offset-based pagination on large datasets. The query interface supports filtering by session, tenant, roles, and system message inclusion, with configurable page sizes. Cursor pagination enables the Studio conversation viewer and API clients to efficiently navigate through long conversation histories without loading all messages into memory.

5. Tool Calling

5.1 Tool Types

FeatureDescription
HTTP toolsREST API tool calls with full-featured HTTP client capabilities including auth profile binding (OAuth2, API key, bearer token), configurable retry with exponential backoff, per-tool timeout enforcement, circuit breaker protection for unreliable endpoints, and SSRF prevention that blocks requests to private IP ranges. HTTP tools support all methods (GET, POST, PUT, PATCH, DELETE), custom headers, request body templates with variable interpolation, and response path extraction.
MCP toolsModel Context Protocol integration that discovers and executes tools hosted on external MCP servers. The MCP client connects to tool servers via stdio or HTTP transport, lists available tools with their schemas, and invokes them with parameter validation. Result size capping prevents oversized MCP tool outputs from consuming the LLM’s token budget. This enables agents to leverage a growing ecosystem of MCP-compatible tool providers without custom integration code.
Sandbox toolsIsolated code execution using gvisor-based containers that run user-provided JavaScript or Python code in a secure sandbox with resource limits (CPU, memory, execution time). The sandbox runner factory selects between gvisor (production) and lambda (development) backends. Sandbox tools enable agents to perform computations, data transformations, and custom logic that cannot be expressed in CEL expressions or tool calls, while preventing untrusted code from accessing the host system.
Lambda toolsServerless function invocation that triggers AWS Lambda (or compatible) functions as tool calls, with handler template generation for common patterns. Lambda tools support async invocation for long-running operations and synchronous invocation for quick computations. The tool definition includes the function ARN, input/output schemas, and timeout configuration, and the runtime handles credential management and result deserialization automatically.
Connector toolsPre-built connector actions powered by the ActivePieces and Nango ecosystems, providing out-of-the-box integrations with services like Slack, GitHub, Salesforce, Google Drive, and 25+ others. Connector definitions are auto-compiled into standard tool definitions at build time via the connector-to-tool compiler. OAuth credentials are managed by Nango, and context translation adapters handle the mapping between the platform’s tool calling interface and each connector’s native API.
Workflow toolsTrigger long-running Restate durable workflows or BullMQ job queues as tool calls from within agent execution. Workflow tools are ideal for operations that take minutes or hours (e.g., data processing, approval chains, batch operations) and would otherwise block the conversation. The tool call returns a workflow/job ID that can be polled or awaited, and completion events can trigger agent notifications.

5.2 Tool Execution

FeatureDescription
Tool binding executorThe ToolBindingExecutor connects agent IR tool references to their concrete implementations (HTTP client, MCP client, sandbox runner, connector adapter) at runtime. It manages the full middleware chain (audit, PII scrubbing, validation, timing) through a composable onion-model architecture. The executor resolves tool names to implementations, prepares parameters, applies middleware, and captures results with trace context for observability.
Parameter validationJSON Schema-based validation of tool call parameters before execution, catching type mismatches, missing required fields, and constraint violations early. The tool-schema-validator checks each parameter against the ToolDefinition’s parameter schema from the compiled IR. Validation failures are returned as structured errors to the LLM, enabling it to self-correct and retry with valid parameters rather than causing downstream API errors.
Result validationThe result validation middleware validates tool call outputs against the tool’s declared ToolReturnType schema. It operates in two modes: ‘warn’ (logs mismatches but returns the result unchanged for debugging) and ‘strict’ (throws on type mismatch for production safety). This ensures that downstream agent logic receives data in the expected shape, preventing cascading errors when external APIs return unexpected response structures.
Confirmation gatesRequire explicit user approval before executing tools marked as sensitive (e.g., payment processing, data deletion, account changes). When a tool has a confirmation gate, the runtime pauses execution, presents the proposed action to the user with parameter details, and waits for approval or rejection. This human-in-the-loop pattern prevents agents from taking irreversible actions without user consent, critical for trust and compliance.
Parallel tool executionExecute multiple independent tool calls concurrently when the LLM requests several tools in a single response. The runtime identifies non-dependent tool calls, dispatches them in parallel using Promise.allSettled semantics, and returns all results to the LLM in the next turn. This significantly reduces latency for reasoning cycles that require multiple data lookups or API calls, as each tool call is made simultaneously rather than sequentially.
Tool timeout enforcementPer-tool configurable timeout specified in the tool definition’s IR, with automatic fallback when the timeout elapses. The timeout wraps the entire tool execution including middleware, preventing slow external APIs from blocking the conversation indefinitely. When a timeout fires, the tool call is aborted and either an error is returned to the LLM (for self-recovery) or the configured on_failure handler executes.

5.3 Tool Middleware

FeatureDescription
Audit middlewareComposable middleware in the tool call chain that logs every tool invocation (name, parameters, result, latency, success/failure) to the trace context for compliance and debugging. Audit entries include tenant ID, session ID, and agent name for full attribution. The logging middleware integrates with the TraceContextManager to emit structured trace events that appear in the Observatory dashboard for post-hoc analysis of agent behavior.
Trace scrubberAutomatically redacts sensitive data from tool call inputs and outputs before they are written to trace events and observability systems. The scrubber detects and replaces Bearer tokens, Authorization headers, {{secrets.*}} placeholders, sensitive header values (cookie, x-api-key), API keys in query parameters, and PII (via the built-in PII detector). This prevents credentials and personal information from leaking into logs, dashboards, and audit trails.
Sanitizer middlewareCleans tool output for safe rendering in user-facing channels by stripping potentially dangerous content (script tags, event handlers, data URIs) from tool results. The sanitizer runs after tool execution but before the result is injected into the LLM context or displayed to the user. This prevents XSS and injection attacks from malicious tool responses while preserving the legitimate content needed for agent reasoning.
Result size cappingPrevents oversized tool results from consuming the LLM’s token budget by truncating or summarizing results that exceed a configurable byte limit. Large API responses (e.g., full database dumps, lengthy document contents) are capped with a truncation marker, and the original size is recorded in metadata. This ensures that a single verbose tool call does not crowd out the conversation history and other context from the LLM’s context window.
HTTP resilienceBuilt-in resilience for HTTP tool calls including configurable retry with exponential backoff (respecting Retry-After headers), circuit breaker that opens after consecutive failures and probes with half-open requests, and HTTP keep-alive connection pooling for reduced latency. The resilience layer is defined via resilience-interfaces and applied transparently by the HTTP tool executor, ensuring that transient network failures do not cause agent conversations to fail.

6. Agent Development (Studio)

6.1 Editor

FeatureDescription
Monaco code editorFull-featured code editor powered by Monaco (the engine behind VS Code) with ABL-specific syntax highlighting, auto-completion for constructs (AGENT, STEP, GATHER, RESPOND, CALL), bracket matching, and inline error annotations. The editor loads the agent’s ABL source and provides a professional development experience with keyboard shortcuts, multi-cursor editing, and search/replace. Changes trigger real-time validation feedback without requiring manual compilation.
Real-time validationLive DSL parsing that runs the ABL parser and validation pipeline on every keystroke (debounced), providing instant error feedback with source-location markers. Errors appear as red annotations in the Monaco editor gutter, while warnings appear as yellow annotations. This enables developers to catch broken references, type mismatches, and syntax errors immediately as they write, dramatically reducing the edit-compile-test cycle.
Agent structure viewVisual representation of the agent’s compiled structure showing steps, tools, gather fields, constraints, handoffs, and flow transitions as a navigable tree or graph. The structure view is generated from the compiled AgentIR and updates in real-time as the source changes. Clicking on a node navigates to the corresponding source location, providing a high-level architectural overview that helps developers understand complex agent topologies at a glance.
Step editorDedicated UI for editing individual flow steps with form-based fields for respond messages, tool call configurations, SET assignments, GATHER fields, and transition targets. The step editor provides a visual alternative to raw DSL editing for teams that prefer graphical authoring. Tool binding is supported via dropdown selection from the project’s tool library, with parameter schema display and validation.
Tool pickerBrowse, search, and select tools from the project’s tool library with filtering by type (HTTP, MCP, sandbox, connector, lambda). The tool picker displays each tool’s description, parameter schema, and auth profile binding, enabling developers to quickly add tool references to agent definitions. Selected tools are inserted into the ABL source with the correct syntax and parameter placeholders, reducing boilerplate and reference errors.
Import resolutionResolve cross-agent and cross-tool references across a project’s file tree during compilation and editor validation. When an agent references another agent (HANDOFF, DELEGATE) or a tool defined in a separate file, the import resolver locates the target definition and validates its interface compatibility. Unresolved references are flagged as errors in the editor, and auto-complete suggestions include available agents and tools from the project scope.

6.2 Project Management

FeatureDescription
Project creationCreate new projects with optional template selection from a library of starter configurations (customer support, sales assistant, knowledge base agent, etc.). Project creation provisions the project’s MongoDB namespace, sets up default environment variables, and assigns the creating user as project owner. Each project is scoped to a tenant and inherits the tenant’s model provisioning and plan limits.
Project dashboardCentral overview page showing the project’s agents (with status and version), recent deployments, activity timeline, and key metrics (session count, active conversations). The dashboard provides quick access to the editor, deployment, and settings for each agent. Activity cards highlight recent changes, failed deployments, and evaluation results, giving the team a snapshot of project health at a glance.
Agent listingBrowse, filter, and search all agents within a project with sortable columns for name, type (reasoning, flow, supervisor, voice), version, and last-modified date. The listing supports bulk operations and provides quick-action buttons for editing, testing, and deploying individual agents. Filter presets enable teams to quickly find agents by type, deployment status, or recent activity.
Environment variablesManage environment variables per project and per environment (development, staging, production) through the Studio UI. Variables are available to agents at runtime via template interpolation ({{env.VARIABLE_NAME}}) and tool parameter binding. Environment-specific overrides enable different API endpoints, feature flags, and configuration values across deployment stages without changing agent source code.
Secrets managementEncrypted secret storage for API keys, OAuth tokens, database passwords, and other credentials using AES-256-GCM encryption at rest. Secrets are referenced in tool definitions via {{secrets.SECRET_NAME}} placeholders that are resolved at runtime and never exposed in logs, traces, or API responses. Access to secrets is scoped to the project level, and secret values are masked in the UI after creation for security.
Project-level LLM configConfigure which LLM models and providers are available for agents within a project, with per-model hyperparameter templates (temperature, top_p, max_tokens). Project-level configuration inherits from the tenant’s model provisioning but can restrict the available model set or override default parameters. This enables teams to standardize on specific models for consistency while allowing experimentation in development environments.

6.3 Deployment

FeatureDescription
Version managementAgent versioning with semantic version tracking, full history of all published versions, and side-by-side diff comparison between any two versions. Each version captures the compiled IR, source hash, and compilation metadata, enabling precise change tracking. The version history is immutable, ensuring that any previously deployed version can be inspected or restored at any time.
Deployment workflowPromote agent versions through a multi-environment pipeline (development, staging, production) with per-environment configuration overrides. Each promotion creates a deployment record linking the agent version to the target environment. The workflow supports manual approval gates between stages, ensuring that only tested and reviewed agent versions reach production. Rollback to a previous version is available with a single action.
Lambda deploy triggerOne-click deployment that packages the agent’s compiled IR and runtime dependencies into a serverless function and deploys it to AWS Lambda or compatible infrastructure. The deploy trigger generates handler templates, configures environment variables, and sets up API Gateway routing automatically. This enables agents to be deployed as standalone serverless endpoints without managing server infrastructure.
Deployment historyComplete audit trail of all deployments including timestamp, deploying user, agent version, target environment, and deployment status (success, failed, rolled back). The history view enables filtering by environment and agent, with one-click rollback to any previous successful deployment. Failed deployments include error details and logs to aid debugging, and the history is retained indefinitely for compliance.
Git webhook integrationAuto-deploy agents on git push via webhooks from Bitbucket, GitHub, or GitLab repositories. When a push is received on a configured branch, the platform pulls the updated source, compiles it, runs validation, and deploys the new version to the target environment. Webhook payloads are verified with HMAC signatures for security, and deployment results are reported back as commit status checks.

6.4 Architect (AI-Assisted)

FeatureDescription
Architect panelAn AI-assisted design panel integrated into the Studio that helps developers plan and architect agent solutions. The Architect uses conversation context and project metadata to suggest agent structures, tool selections, and flow designs. It operates as a sidebar companion that can generate code snippets, explain ABL constructs, and propose architectural improvements based on the current project state.
Onboarding wizardGuided agent creation experience that collects project requirements through an interview-based brief (target use case, expected user personas, required integrations, conversation style). The wizard asks structured questions and captures answers that feed into the Architect’s topology generation. This lowers the barrier to entry for non-technical users and ensures that new agents start with a well-considered architectural foundation rather than blank-slate guesswork.
Agent topology generationAI-generated multi-agent topology from a natural language brief that automatically determines how many agents are needed, their types (reasoning, flow, supervisor), relationships (handoffs, delegations), and routing strategies. The topology generator uses the context builder to analyze the brief, identify intents and entities, and produce a complete project structure. The generated topology can be reviewed, edited, and refined before code generation.
Spec generationGenerate complete ABL specifications (agent files, tool definitions, flow steps) from architecture descriptions produced by the topology generator or written manually. The spec generator translates high-level intent descriptions into concrete ABL constructs with appropriate gather fields, constraints, tool bindings, and response templates. Generated specs serve as a starting point that developers refine, significantly accelerating the agent development cycle.
Edit suggestionsAI-powered diff suggestions that analyze the current agent source and propose specific improvements as inline code changes. Suggestions are presented as reviewable diffs that can be accepted, rejected, or modified before applying. The suggestion engine considers conversation quality metrics, evaluation results, and best practices to recommend improvements like adding missing error handlers, refining gather prompts, or optimizing tool call patterns.

6.5 Import / Export

FeatureDescription
Project exportExport an entire project including all agents, tool definitions, auth profiles, environment variables, and configuration as a portable archive. The export captures the complete project state in a structured format that preserves cross-references between agents, tools, and configurations. Exported archives can be shared between teams, used for backup, or imported into different tenants for project duplication.
Project importImport a previously exported project archive with automatic cross-reference resolution that remaps agent names, tool IDs, and auth profile references to avoid conflicts with existing resources. The import process detects naming collisions and presents conflict resolution options (rename, overwrite, skip) before applying changes. Auth profiles with encrypted credentials require re-entry of sensitive values, as credential material is stripped during export for security.
Git syncBidirectional synchronization with Bitbucket, GitHub, and GitLab repositories that keeps the Studio project in sync with a git branch. Changes made in the Studio editor are committed to the repository, and changes pushed to the repository are reflected in the Studio. Conflict resolution follows git merge semantics, with the Studio providing a visual diff for manual resolution when automatic merge is not possible.
V2 layered importIncremental import system that supports selective overwrite of individual agents, tools, or configurations without replacing the entire project. The V2 import analyzes the archive contents against the existing project state and presents a layer-by-layer diff showing what will be added, modified, or left unchanged. This enables teams to adopt updates from a shared template or reference project while preserving their local customizations.

7. Agent Testing & Evals

7.1 Evaluation Framework

FeatureDescription
Eval pipeline engineMulti-stage evaluation pipeline that executes test conversations against deployed agents and scores the results through configurable evaluation stages. The pipeline engine supports parallel execution of test cases, configurable LLM judges for nuanced quality assessment, and aggregation of results across multiple dimensions. Each stage can use different evaluation criteria (factual accuracy, policy compliance, conversation quality) to produce a comprehensive quality report.
Persona-based testingDefine synthetic user personas with specific characteristics (expertise level, communication style, typical queries, expected behaviors) for realistic test conversations. Personas are used by the eval pipeline to simulate diverse user interactions, ensuring agents handle edge cases, ambiguous inputs, and adversarial queries. This replaces simple input/output test cases with realistic multi-turn conversations that exercise the full agent flow.
Rubric-based scoringMulti-dimensional scoring rubrics with weighted criteria that enable fine-grained quality assessment beyond simple pass/fail. Each rubric defines scoring dimensions (e.g., helpfulness, accuracy, tone, completeness) with descriptive scales and relative weights. Rubrics are evaluated by LLM judges that produce per-dimension scores and explanations, making it easy to identify specific areas where agents need improvement.
Test case managementCreate, organize, and run test suites through the Studio UI with support for grouping test cases by category, priority, and agent. Test cases can be authored manually, generated from production conversation samples, or created by the Architect’s suggestion engine. Suites support versioning and comparison across runs, enabling teams to track quality trends over time and catch regressions before deployment.

7.2 Evaluation Pipelines

FeatureDescription
Knowledge gap detectionIdentifies cases where agents respond with insufficient or missing information by comparing agent responses against expected knowledge coverage. The detector analyzes whether key facts from the knowledge base were included in the response and flags omissions that could lead to user confusion. Results are surfaced as actionable feedback that helps teams improve agent knowledge configurations and response templates.
Hallucination detectionDetects factual inconsistencies between agent responses and the authoritative source material (knowledge base documents, tool results, provided context). The detector uses a grounding validator that cross-references specific claims in the response against the sources cited, flagging statements that cannot be verified or that contradict the source material. This is critical for knowledge-intensive agents where accuracy is paramount.
Guardrail complianceVerifies that agents consistently adhere to defined guardrail policies (content restrictions, PII handling, tone requirements, topic boundaries) across all test conversations in a suite. The compliance checker replays conversations through the guardrail pipeline and reports any violations with the specific guardrail rule that was triggered. This ensures that guardrail coverage is comprehensive and that edge cases do not bypass safety policies.
LLM-based evaluationUses LLM judges (configurable model and prompt) for nuanced quality scoring that captures subjective dimensions like helpfulness, empathy, clarity, and conversational flow. LLM judges evaluate each test conversation against a rubric and produce structured scores with natural language explanations. Multiple judges can be configured for different evaluation perspectives, and their scores are aggregated with configurable weighting.
Quality metricsQuantitative quality metrics including BLEU score (lexical similarity to reference responses), semantic similarity (embedding-based comparison), factual accuracy (claim verification rate), and custom formula-based metrics. Metrics are computed per test case and aggregated per suite, with historical tracking to show quality trends over time. Dashboard visualizations highlight metric distributions and identify outlier conversations that need attention.

7.3 Eval UI

FeatureDescription
Test management dashboardCentral UI for organizing evaluation suites, viewing run history, and launching new evaluation runs. The dashboard displays suite status (pass/fail rates, average scores), recent run results, and trend charts. Teams can create and manage suites, assign test cases to suites, configure evaluation criteria, and schedule recurring evaluation runs to monitor quality continuously.
Comparison heatmapSide-by-side visual comparison across multiple evaluation runs using a color-coded heatmap that highlights score differences per test case and dimension. This makes it easy to spot regressions (red cells) and improvements (green cells) between agent versions, model changes, or prompt modifications. The heatmap supports comparison of up to four runs simultaneously, enabling A/B testing of agent configurations.
Results drill-downInspect individual test case results with full conversation transcript, per-dimension scores, LLM judge reasoning, and source evidence. The drill-down view shows the complete execution trace including tool calls, guardrail evaluations, and NLU analysis for each turn. This level of detail enables developers to understand exactly why a test case scored as it did and what specific changes would improve the result.
Run dialogConfigure and execute evaluation runs with model selection, rubric choice, persona assignment, and parallelism settings. The dialog presents all configurable parameters with sensible defaults and validates the configuration before launch. Run progress is tracked in real-time with a progress bar and live result streaming, and completed runs generate notification alerts with summary statistics.

8. Agent Observability

8.1 Trace System

FeatureDescription
Execution tracingCapture every execution step, LLM call, tool invocation, routing decision, guardrail evaluation, and NLU analysis as structured TraceEvents via the shared TraceStore. Each event includes a type, timestamp, duration, status, and payload with relevant context data. Tracing is mandatory for all execution paths (no ad-hoc logging as substitute) and provides the foundation for the Observatory dashboard, debug protocol, and analytics pipeline.
Span hierarchyParent-child span relationships that model nested execution contexts (session > agent > step > tool call > HTTP request) as a tree of spans. Each span tracks its parent span ID, enabling reconstruction of the full execution tree for any session. The hierarchy captures delegation chains, handoff transitions, and parallel fan-out branches, providing a complete picture of how a conversation flowed through multiple agents and tools.
Event timelineChronological event stream per session that presents all trace events in temporal order with precise timestamps and duration measurements. The timeline enables post-hoc replay of any conversation, showing exactly what happened at each point in the interaction. Events include LLM prompts and responses, tool call inputs and outputs, routing decisions, guardrail evaluations, and state changes, providing a complete audit trail.
HLC timestampsHybrid Logical Clock (based on Kulkarni et al., 2014) that provides causal ordering of trace events across pods in a distributed Kubernetes deployment. HLC combines physical wall clock time with a logical counter and node ID to disambiguate events within the same millisecond even when clocks are skewed across nodes. This ensures that trace event ordering is consistent and causally correct regardless of which pod processed each event.
Trace attachmentsAttach rich context data (full LLM prompts, raw responses, tool call payloads, gather field values, session state snapshots) to individual trace events. Attachments are stored alongside the trace event and rendered in the Observatory’s event detail panel. Sensitive data in attachments is automatically scrubbed by the trace scrubber before storage, ensuring observability without compromising security or privacy.

8.2 Observatory Dashboard

FeatureDescription
Span tree viewHierarchical tree visualization of execution spans showing parent-child relationships with timing bars that indicate duration and overlap. Each span node displays its type (agent, step, tool, LLM call), status (success, error, timeout), and latency. The tree can be expanded/collapsed at any level, and clicking a span opens its detail panel with full payload data, making it easy to identify bottlenecks and failures in complex agent execution paths.
Decision cardsVisual cards that highlight agent decision points including routing choices (which agent was selected and why), constraint evaluations (which rules passed/failed), guardrail triggers (which policies were checked), and handoff/delegation decisions. Each card shows the decision inputs, evaluated conditions, and selected outcome, making the agent’s reasoning process transparent. This is essential for debugging unexpected routing or policy enforcement behavior.
Flow visualizationGraphical flow diagram of the agent’s execution path through its defined steps, showing which steps were visited, the transitions taken, and which branches were skipped. The visualization overlays runtime data onto the agent’s static flow definition, highlighting the actual path through the conversation in color-coded segments (green for completed, yellow for active, gray for unvisited). This helps developers verify that conversations follow expected patterns.
Event detail panelFull-payload inspection view for individual trace events, showing raw request/response data, timing information, metadata, and attached context. The panel supports JSON formatting with collapsible sections, text search within payloads, and copy-to-clipboard for debugging. Sensitive data is displayed with redaction markers where the trace scrubber has removed PII or credentials, maintaining security while enabling detailed investigation.
Cost trackingPer-span LLM cost and token breakdown that calculates the monetary cost of each LLM call based on the provider’s per-token pricing (input tokens, output tokens). Costs are aggregated up the span hierarchy to show total cost per agent, per session, and per conversation. The cost view enables teams to identify expensive reasoning patterns, optimize prompt lengths, and compare costs across different models and providers.
Live debug modeReal-time trace streaming during active sessions that pushes trace events to the Observatory dashboard as they occur, without waiting for the session to complete. Events appear in the timeline and span tree views with sub-second latency, enabling developers to observe agent behavior live. This is particularly valuable for debugging long-running conversations, voice sessions, and multi-agent interactions where post-hoc analysis alone is insufficient.

8.3 Debug Protocol

FeatureDescription
BreakpointsSet breakpoints on specific agent steps, tool calls, or decision points that pause execution when reached, allowing developers to inspect the full session state before continuing. Breakpoints can be set from the Studio editor or Observatory dashboard and are scoped to debug sessions (they do not affect production traffic). Conditional breakpoints support CEL expressions, enabling developers to pause only when specific conditions are met.
Step-through executionAdvance agent execution one step at a time when paused at a breakpoint, observing the effect of each instruction (RESPOND, CALL, SET, GATHER, CONSTRAINT) on the session state. Step-through mode provides visibility into the exact order of operations within a step, including middleware execution and expression evaluation. This enables precise debugging of complex flow logic and tool call sequences that are difficult to diagnose from trace logs alone.
Variable inspectionExamine the complete session state at any breakpoint including all session variables, gather field values, handoff/delegate stacks, conversation history, and NLU analysis results. Variables are displayed in a structured tree view with type information and modification history (when the value last changed). This enables developers to verify that variable assignments, tool result extractions, and expression evaluations produce the expected values at each point in execution.
Debug session managementStart, pause, resume, and terminate debug sessions through the Studio UI or API. Debug sessions run with enhanced tracing and breakpoint support enabled but are otherwise identical to production execution. Multiple debug sessions can run concurrently for different agents, and each session maintains its own breakpoint set and execution state. Session management includes cleanup of stale debug sessions to prevent resource leaks.

9. Model Hub

9.1 LLM Providers

FeatureDescription
Anthropic ClaudeFull integration with Anthropic’s Claude model family including Claude 3.5 Sonnet (high performance, fast), Claude 3 Opus (highest capability), and Claude 3 Haiku (fastest, most cost-effective). Supports tool/function calling, streaming responses, system prompts, and multi-turn conversations through the Anthropic Messages API. Model selection is configurable per agent, per project, or per tenant with automatic credential resolution from the multi-tenant credential chain.
OpenAIIntegration with OpenAI’s model family including GPT-4, GPT-4o (multimodal), and GPT-3.5 Turbo, supporting chat completions, function calling, streaming, and JSON mode. The platform uses the provider-agnostic LLMToolDefinition format that maps to OpenAI’s function calling schema at runtime. OpenAI models can also be used as LLM judges in the evaluation pipeline and as guardrail evaluation providers for semantic content assessment.
Azure OpenAIAzure-hosted OpenAI models accessed through Azure’s dedicated API endpoints with Azure Active Directory authentication and private network support. The Azure provider adapter handles deployment-name-based routing (instead of model names), regional endpoint selection, and Azure-specific rate limit headers. This enables enterprise customers to use OpenAI models through their existing Azure infrastructure with data residency and compliance guarantees.
Google GeminiIntegration with Google’s Gemini model family including Gemini Pro (general purpose), Gemini Flash (low latency), and Gemini Live (realtime multimodal voice interaction). Gemini Live enables native audio-in/audio-out voice agents with function calling support, making it an alternative to OpenAI’s Realtime API. The Gemini provider adapter translates the platform’s unified tool format to Gemini’s function declaration schema.
DeepseekIntegration with Deepseek’s reasoning models that excel at chain-of-thought problem solving and complex analytical tasks. Deepseek models are accessed through the platform’s OpenAI-compatible provider adapter, as they implement the OpenAI chat completions API. This provides access to high-quality reasoning capabilities at a lower cost point, making them suitable for analytical agent use cases where step-by-step reasoning is more important than speed.
OpenAI-compatibleSupport for any LLM provider that implements the OpenAI-compatible chat completions API, including self-hosted solutions like vLLM, Ollama, LM Studio, and cloud providers like Together AI and Fireworks. The adapter requires only a base URL and API key, and automatically discovers supported features (streaming, function calling, JSON mode) through capability probing. This enables teams to use custom fine-tuned models or on-premises deployments without modifying agent definitions.

9.2 Model Management

FeatureDescription
Model registryCentral catalog of all available LLM models with metadata including provider, model name, capability flags (streaming, function calling, vision, realtime voice), context window size, and per-token pricing (input/output). The registry is maintained in the database and serves as the source of truth for model selection dropdowns, cost calculations, and capability-based routing. Admin users can add custom models and update pricing as providers change their rates.
Tenant model provisioningAssign specific LLM models to tenants along with connection credentials (API keys, Azure deployment configs, custom endpoint URLs). Provisioning creates a tenant-to-model mapping that controls which models are available within a tenant’s projects. Credential storage uses AES-256-GCM encryption at rest, and the provisioning system supports multiple credentials per model for failover and load distribution across API key rate limits.
Provider-agnostic toolsThe unified LLMToolDefinition type abstracts tool/function calling across all LLM providers, enabling a single tool definition to work with OpenAI, Anthropic, Google, and any OpenAI-compatible provider. The platform translates LLMToolDefinition to each provider’s native format (OpenAI function schema, Anthropic tool use, Gemini function declarations) at call time. This eliminates provider lock-in and enables model switching without modifying tool definitions.
Hyperparameter templatesPre-configured sets of model hyperparameters (temperature, top_p, max_tokens, frequency_penalty, presence_penalty) per model and use case. Templates provide sensible defaults for common scenarios (creative writing: high temperature, factual Q&A: low temperature, code generation: zero temperature) and can be overridden at the tenant, project, or agent level. This ensures consistent model behavior across agents while allowing fine-tuning for specific use cases.
Token countingAccurate token counting and cost estimation per provider using provider-specific tokenizers. Token counts are recorded per LLM call (input tokens, output tokens) and aggregated per span, session, agent, project, and tenant. Cost is calculated by multiplying token counts by the per-token pricing from the model registry. This data feeds into the usage dashboards, billing metering, and plan-based limit enforcement.

9.3 LLM Features

FeatureDescription
Tool/function callingStructured tool invocation that translates the platform’s unified LLMToolDefinition to each provider’s native function calling format and back. Tool calls are extracted from LLM responses, validated against parameter schemas, executed through the tool middleware chain, and results are returned to the LLM in the next turn. Parallel tool calling is supported when the LLM requests multiple tools simultaneously, with results batched for efficiency.
Streaming responsesReal-time token streaming that delivers LLM output tokens to the user as they are generated, reducing perceived latency from seconds to milliseconds for the first visible response. Streaming is implemented via SSE (Server-Sent Events) for digital channels and WebSocket frames for voice channels. The streaming pipeline handles partial tool call detection, guardrail evaluation on accumulated content, and graceful error recovery if the stream is interrupted.
System prompt managementDynamic system prompt construction that assembles the agent’s system message from identity, persona, expertise, limitations, active behavior profile instructions, gather field context, constraint rules, and conversation state. The prompt builder injects relevant context (current step, available tools, pending gather fields) to guide the LLM’s behavior. System prompts are constructed fresh for each LLM call to reflect the current execution state.
Response cachingCache LLM responses using a content-addressed hash of the system prompt, messages, model, and tools. The LLMResponseCache supports file-based storage for test acceleration and can reduce redundant LLM calls during development and evaluation runs. Cache entries include hit counters and timestamps for staleness detection. This significantly speeds up test re-runs and reduces cost during iterative agent development cycles.
Credential resolutionMulti-tenant credential lookup that resolves LLM provider credentials through a fallback chain: agent-specific override, project-level configuration, tenant provisioning, and platform default. The resolver returns the first valid credential found in the chain, enabling fine-grained credential management where most agents use the tenant default but specific agents can override to use dedicated API keys. Resolution results are cached with TTLs to minimize database lookups per LLM call.

9.4 Realtime Voice LLMs

FeatureDescription
OpenAI Realtime APINative integration with OpenAI’s Realtime API for audio-in, audio-out voice agents with function calling support. The audio stream is sent directly to the model which generates both text and audio responses, eliminating the STT/TTS pipeline for lower latency. Tool calls made by the voice model are executed through the same tool middleware chain as digital agents, ensuring consistent behavior across modalities.
Google Gemini LiveReal-time multimodal voice interaction using Google’s Gemini Live API that supports simultaneous audio and text input/output with function calling. Gemini Live enables natural voice conversations with the full capability of the Gemini model family, including vision and document understanding during voice sessions. The integration handles session management, audio format negotiation, and tool dispatch through the platform’s standard execution pipeline.
UltravoxIntegration with Ultravox, a purpose-built voice agent model optimized for low-latency conversational interactions. Ultravox processes audio directly without separate STT/TTS stages, providing faster response times for voice-first use cases. The model supports tool calling and structured output, and is integrated into the platform’s voice runtime alongside OpenAI Realtime and Gemini Live as an alternative provider for teams prioritizing voice interaction quality.
WebSocket sessionsPersistent bidirectional audio streaming via WebSocket connections that maintain a continuous session between the client (browser or telephony gateway) and the voice runtime. WebSocket sessions handle audio frame buffering, connection keep-alive, graceful reconnection, and session state synchronization. The session lifecycle (open, active, paused, closed) is tracked in the conversation store for analytics and billing purposes.
VAD (Voice Activity Detection)Server-side voice activity detection that identifies when the user is speaking versus silence, managing turn-taking between the user and agent. VAD prevents the agent from interrupting the user mid-utterance and detects end-of-speech to trigger response generation. The detection supports configurable sensitivity thresholds and silence duration parameters, with integration into barge-in handling for cases where the user intentionally interrupts the agent.

10. Channels

10.1 Messaging Channels

FeatureDescription
WhatsAppWhatsApp Business integration via Twilio that supports text messages, media attachments (images, documents, audio), and WhatsApp message templates for proactive notifications. Inbound messages are routed to the appropriate agent based on the phone number and session state, with automatic session resumption for returning users. Rich content is adapted to WhatsApp’s formatting capabilities, converting platform carousel/button elements to WhatsApp interactive message formats.
SMSSMS channel via Twilio supporting inbound and outbound text messaging with automatic session management based on phone number. The SMS adapter handles message segmentation for long responses, opt-out keyword detection (STOP/UNSUBSCRIBE), and delivery status tracking. SMS sessions can be escalated to voice calls through the platform’s voice gateway, enabling seamless channel switching during a conversation.
EmailSMTP-based email channel supporting inbound message processing (parsing body, extracting attachments) and outbound email generation with HTML formatting and file attachments. The email adapter handles thread tracking via message ID references, enabling multi-turn conversations over email. Rich content from agent responses is rendered as HTML email with inline images, and action buttons are converted to email-compatible link buttons.
SlackSlack workspace app integration with event subscription handling for messages, reactions, and slash commands. The Slack adapter translates platform rich content (carousels, buttons, action cards) into Slack Block Kit format for native rendering. Conversations are tracked per Slack channel or thread, and the adapter supports multi-workspace deployments where different workspaces route to different agents or tenants.
HTTP AsyncWebhook-based asynchronous channel for custom integrations that sends and receives messages via HTTP POST callbacks. The HTTP async channel supports custom payload formats, configurable authentication (API key, HMAC signature), and retry with exponential backoff for delivery failures. This enables integration with proprietary messaging systems, IoT devices, and custom applications that do not fit standard channel adapters.
Web SDKJavaScript widget (web_chat channel) for embedding agents directly into web applications with customizable appearance, position, and behavior. The SDK handles WebSocket connection management, message rendering with rich content support (Markdown, buttons, carousels), file upload, and typing indicators. It provides a drop-in integration that can be added to any website with a single script tag and configuration object.

10.2 Voice Channels

FeatureDescription
WebSocket voiceDirect browser-to-server real-time audio streaming via WebSocket for voice interactions without requiring telephony infrastructure. Audio is streamed as PCM16 frames at 16kHz, processed by the voice runtime’s STT/TTS pipeline (Deepgram/ElevenLabs) or sent to realtime LLM providers (OpenAI Realtime, Gemini Live). WebSocket connections support pre-warming for low-latency first response and graceful reconnection on network interruptions.
Twilio PSTNPhone call integration via Twilio Media Streams that bridges traditional PSTN phone calls to the platform’s voice agent runtime. Twilio handles the telephony layer (call routing, PSTN connectivity, recording) and streams audio to the platform in real-time. The Twilio media handler manages G711 ulaw/alaw encoding, DTMF tone detection, and call control events (hold, transfer, conference).
Voice gatewayPSTN-to-web transfer capability that enables callers on a phone line to be seamlessly transferred to a web-based agent experience. The gateway manages the session continuity during the transfer, preserving conversation context and agent state so the user does not need to restart the interaction. This supports hybrid scenarios where voice interactions start on the phone and continue on a web interface with richer capabilities.

10.3 Channel Infrastructure

FeatureDescription
Channel registryCentral registry of available channels with provider metadata, capability flags (supports rich content, supports media, supports voice), and connection configuration templates. The registry enables the runtime to discover which channels are configured for a project and route incoming messages to the appropriate channel adapter. New channel types can be registered without modifying core runtime code, following the adapter pattern.
Channel connection modelPer-channel authentication and configuration that stores the credentials, webhook URLs, and provider-specific settings needed to connect to each channel. Connection configurations are encrypted at rest and scoped to the project level. Each connection stores its health status (connected, disconnected, error) and supports test-connection verification to validate credentials before going live.
Channel session trackingMaps user identity to channel-specific sessions using channel artifacts (phone number for SMS/WhatsApp, Slack user ID, email address, cookie for web chat). The tracker resolves returning users to their existing session for conversation continuity, or creates a new session for first-time interactions. Identity resolution supports the progressive identity tier system (anonymous, unverified, verified) for cross-channel user recognition.
Widget configurationCustomize the web chat widget’s appearance (colors, fonts, position, avatar, welcome message) and behavior (auto-open rules, proactive messages, file upload toggle) through the Studio settings UI. Configuration is stored per deployment and served to the Web SDK at load time. This enables teams to match the widget’s look and feel to their brand identity and control when and how the agent presents itself to website visitors.
Rich content renderingAdapts platform rich content types (carousels, buttons, action cards, images, Markdown) to each channel’s native rendering capabilities. The rendering layer checks the target channel’s capability flags and transforms content accordingly: carousels become Slack Block Kit attachments, WhatsApp interactive lists, or HTML card grids. Channels that do not support a content type receive a graceful text-based fallback, ensuring agents can use rich content without channel-specific conditionals.

11. Integrations (Connectors)

11.1 Connector Framework

FeatureDescription
Connector registryDiscover and list all available connectors with metadata including name, description, supported actions, authentication type, and category (productivity, CRM, project management, etc.). The registry is queryable through the Studio UI and API, enabling developers to browse integrations and add them to projects without manual configuration. Connectors are versioned and can be updated independently of the platform release cycle.
Connector-to-tool compilationAutomatically generates platform-standard tool definitions from connector action schemas, translating each connector action’s input/output parameters into JSON Schema-based ToolDefinitions at compile time. This eliminates manual tool authoring for connector integrations and ensures that tool parameter validation, result handling, and middleware application work identically for connector tools and hand-written HTTP tools.
OAuth2 orchestrationManages the complete OAuth2 authorization code flow for connector authentication, including redirect URL handling, authorization code exchange, token storage, and automatic refresh. The orchestration layer handles PKCE for public clients, scope negotiation, and multi-step consent flows. Users authenticate once through the Studio UI, and the platform manages token lifecycle transparently for all subsequent agent tool calls to that connector.
Nango integrationUnified OAuth provider mapping and token management through Nango, which handles the complexity of different OAuth providers’ implementation quirks (token formats, refresh mechanisms, scope handling). Nango stores and refreshes tokens automatically, provides a connection ID abstraction that decouples agent code from credential details, and supports 100+ OAuth providers. The platform maps Nango connections to project-scoped auth profiles for secure credential resolution at runtime.
ActivePieces adaptersContext translation adapters that bridge the platform’s tool execution interface with the ActivePieces connector ecosystem. Each adapter translates platform tool call parameters into ActivePieces action input format and maps ActivePieces action output back to the platform’s structured result format. This enables the platform to leverage ActivePieces’ extensive library of pre-built connector actions (250+) without requiring custom integration code for each service.

11.2 Available Connectors (25+)

CategoryConnectors
ProductivitySlack (messaging, channel management, file sharing), Microsoft Teams (chat, meetings, presence), Gmail (send, read, label, search), Google Calendar (create, update, list events), Google Drive (file operations, sharing, search), Google Sheets (read, write, append rows), and Notion (pages, databases, blocks). Each connector provides multiple actions that map to platform tool definitions for agent use.
CRMSalesforce (leads, contacts, opportunities, cases, custom objects), HubSpot (contacts, deals, companies, tickets), and Pipedrive (deals, persons, organizations, activities). CRM connectors enable agents to look up customer records, create and update opportunities, log interactions, and automate sales and support workflows without leaving the conversation.
Project ManagementJira (issues, projects, sprints, transitions), Asana (tasks, projects, sections), ClickUp (tasks, lists, spaces), Linear (issues, projects, cycles), and GitHub (issues, pull requests, repositories, actions). These connectors enable agents to create tickets, update status, assign work, and query project state, bridging conversational interfaces with project management workflows.
StorageAmazon S3 (object upload, download, list, presigned URLs), Airtable (records, tables, views), and PostgreSQL (query, insert, update). Storage connectors provide agents with direct data access capabilities, enabling them to store conversation artifacts, query structured data, and manage file assets as part of their conversation flows.
CommunicationTwilio (SMS, voice calls, phone number management), SendGrid (email sending, template management, contact lists), and Discord (messages, channels, reactions). Communication connectors enable agents to reach out to users through additional channels, send notifications, and orchestrate multi-channel communication workflows.
CommerceStripe (payments, subscriptions, invoices, customers, refunds) and Shopify (orders, products, customers, inventory). Commerce connectors enable agents to handle payment processing, order management, subscription lifecycle, and product inquiries directly within conversations, supporting e-commerce and billing use cases.
AIOpenAI (completions, embeddings, image generation, moderation) and Claude (messages, completions). AI connectors enable agent-to-AI-service tool calls for specialized tasks like content generation, image creation, or text moderation that complement the agent’s primary LLM. This supports patterns where agents delegate specific subtasks to specialized AI models.
EnterpriseSharePoint with full enterprise-grade integration including delta sync (incremental content synchronization), permission crawling (sync source document permissions for access-controlled search), and webhook-based change notifications. The SharePoint connector supports document libraries, lists, and site pages with tenant-isolated credential management, making it suitable for enterprise knowledge base ingestion and document retrieval scenarios.

11.3 Sync & Data

FeatureDescription
Delta syncIncremental data synchronization that tracks change tokens (delta links) from previous sync operations to fetch only new, modified, or deleted items since the last run. Delta sync dramatically reduces API call volume and processing time for large data sources compared to full re-crawl. The sync state is persisted per source, and sync operations are idempotent — interrupted syncs resume from the last checkpoint without data duplication.
Permission crawlingSynchronize source document permissions (user/group access control lists) alongside content during ingestion, enabling permission-filtered search results. The permission crawler maps source-specific ACLs (SharePoint item permissions, Google Drive sharing settings) to a platform-standard permission model. At query time, search results are filtered to only return documents the requesting user has permission to access in the source system.
Webhook triggersEvent-driven execution triggered by connector webhook notifications (e.g., new file in SharePoint, updated Salesforce record, Slack message). The webhook handler validates incoming payloads using provider-specific signatures, maps webhook events to platform actions (re-sync, agent invocation, notification), and enqueues processing jobs via BullMQ. This enables real-time responsiveness to changes in connected systems without polling.
Cron schedulingTime-based scheduling for connector sync jobs using cron expressions (e.g., “every 6 hours”, “daily at 2am”). Scheduled jobs are managed by the BullMQ repeatable job system with distributed locking to prevent duplicate execution across multiple runtime pods. Each sync schedule can be configured with a specific time zone, retry policy, and concurrency limit. The schedule UI in Studio displays upcoming runs and last-run status for monitoring.
Connection credential managementEncrypted credential storage per connector connection using AES-256-GCM encryption at rest with tenant-scoped encryption keys. Credentials include OAuth tokens (managed by Nango), API keys, database connection strings, and service account certificates. The credential manager handles token refresh, expiration detection, and automatic re-authentication, ensuring connector tools always have valid credentials without manual intervention.

12. Agent NLU

12.1 Analysis Tasks

FeatureDescription
Intent detectionPrimary intent recognition that classifies user utterances against the agent’s defined intent catalog with confidence scores. The intent detector uses LLM-based classification with few-shot examples and optional embedding-based similarity matching for high-performance scenarios. Results include the top intent, confidence score, and alternative intents ranked by likelihood, enabling routing decisions and disambiguation prompts when confidence is below threshold.
Multi-intent detectionDetect multiple distinct intents within a single user utterance (e.g., “book a flight and also reserve a hotel room”). The multi-intent strategy is configurable (enabled/disabled, max intents, confidence threshold) in the project runtime config. Detected intents are queued with a configurable max age (queue_max_age_ms) and processed in priority order, ensuring that secondary intents are addressed after the primary intent is resolved.
Sub-intent detectionHierarchical intent decomposition that identifies sub-intents within a broader intent category. The sub-intent detector analyzes utterances in the context of the currently active intent to detect refinements, qualifications, and nested requests. This enables agents to handle complex requests like “change my flight to tomorrow, but keep the same seat” where the sub-intent (keep seat) qualifies the primary intent (change flight).
Entity extractionField-level extraction with semantic typing that identifies and extracts structured values (dates, numbers, phone numbers, emails, addresses, custom entities) from free-form user input. The entity extractor uses both LLM-based extraction and pattern matching, with extraction hints from the gather field configuration guiding the extraction process. Extracted entities are validated against field type constraints and mapped directly to gather fields for slot filling.
Category classificationTopic and domain classification that categorizes user utterances into predefined categories for routing, analytics, and context-aware agent behavior. The category classifier operates independently of intent detection, providing a higher-level classification that can span multiple intents. Categories are defined per agent or project and can be used in routing rules, behavior profile conditions, and analytics dashboards.
Correction detectionDetect when users correct or amend previous statements (e.g., “actually, I meant Tuesday, not Monday”) using configurable strategies. The correction detector analyzes the current utterance in the context of recent conversation history to identify contradictions and revisions. When a correction is detected, the agent can update previously gathered fields and acknowledge the correction, preventing stale data from persisting through the conversation.
Digression detectionIdentify off-topic conversation turns that deviate from the current agent task or gathering context. The digression detector uses LLM-based semantic analysis to distinguish genuine topic changes from clarifying questions or related follow-ups. Detection results include a confidence score and suggested handling (acknowledge and return to topic, briefly address the tangent, or hand off to a more appropriate agent).
Language detectionIdentify the input language of user utterances for multilingual routing and localization. The language detector analyzes the text and returns a language code with confidence score, enabling agents to switch response language, route to language-specific agents, or activate language-appropriate behavior profiles. This supports multilingual deployments where a single entry point routes users to agents that speak their language.

12.2 NLU Engine

FeatureDescription
Combined analyzerOrchestrates multiple NLU tasks (intent detection, entity extraction, category classification, correction detection) in a single pipeline call. The combined analyzer first attempts a unified LLM prompt that performs all requested tasks in one call for efficiency, then falls back to individual task pipelines if the combined approach fails. This reduces LLM call count and latency for the common case while maintaining reliability through per-task fallback.
Few-shot examplesIn-context learning using example utterances provided in the agent’s intent and entity definitions. The NLU prompt builder injects relevant examples into the LLM prompt to improve classification accuracy without fine-tuning. Examples are selected based on similarity to the current input (when embedding-based matching is available) or included statically from the intent catalog. This enables rapid NLU customization by adding examples rather than retraining models.
Embedding-based matchingVector similarity search using pre-computed embeddings for high-performance intent and entity resolution. Intent and entity indexes store embedding vectors for canonical examples, and incoming utterances are compared against these vectors to find the closest match. The embedding provider supports multiple models (including BGE-M3 for multilingual support) and caches embeddings for repeated queries. This provides a fast first-pass classification that can supplement or replace LLM-based NLU for latency-sensitive scenarios.
Prompt loaderDynamic prompt generation system that assembles NLU task prompts from templates with variable interpolation. The prompt loader selects the appropriate template by task type (intent, entity, category, combined), injects the agent’s intent catalog, entity definitions, few-shot examples, and conversation context, and renders the final prompt. Templates are stored as separate files for easy customization and A/B testing of prompt strategies.
Model routingSelects the appropriate LLM model for each NLU task based on the task type, complexity, and configured model layers. The ModelRouter supports multi-layer configurations where a fast, cheap model handles simple tasks (language detection, keyword matching) and a more capable model handles complex tasks (multi-intent detection, entity extraction). Circuit breakers per layer enable automatic fallback when a preferred model is unavailable.

12.3 Enterprise NLU

FeatureDescription
NLU cacheIn-memory cache for NLU analysis results keyed by tenant-scoped content hashes, following the same pattern as the LLMResponseCache. The NLUResultCache stores results with configurable TTLs, access counters, and max entry limits with LRU eviction. Cache stats (hit rate, total entries) are exposed for monitoring. When an optional encryption port is provided, cached results are encrypted at rest to protect sensitive utterance data.
Circuit breakerPer-layer circuit breaker (closed/open/half-open state machine) that wraps LLM provider calls in the NLU pipeline. When a layer’s failure count exceeds the configurable threshold, its circuit opens and the pipeline automatically skips to the next layer or fallback. The circuit transitions to half-open after a reset timeout, allowing probe requests to test if the provider has recovered. This prevents cascading failures when an LLM provider is degraded.
Multi-tenant isolationTenant-scoped NLU models, configurations, and data to ensure complete isolation between tenants. Each tenant can have its own intent catalogs, entity definitions, few-shot examples, and model layer preferences managed by the NLU tenant manager. Tenant isolation extends to the NLU cache (tenant-scoped keys), version tracking, and metrics collection, preventing any cross-tenant data leakage in the NLU pipeline.
PII guardDetects and masks personally identifiable information (names, email addresses, phone numbers, SSNs, credit card numbers) in user utterances before they are sent to NLU LLM providers. The PII guard uses regex patterns and the built-in PII detector to identify sensitive data and replaces it with placeholders. This ensures that PII does not appear in NLU prompts sent to external LLM providers, reducing compliance risk for enterprise deployments.
NLU audit loggingCompliance-grade logging of all NLU analysis decisions including the input utterance (with PII masked), detected intents with confidence scores, extracted entities, and the model/layer used. Audit logs are emitted as trace events with tenant attribution and are retained according to the tenant’s compliance policy. This provides an audit trail for regulatory inquiries and enables post-hoc analysis of NLU accuracy on production traffic.
NLU metricsCollection and reporting of NLU accuracy and performance metrics including intent detection accuracy, entity extraction precision/recall, average analysis latency, cache hit rates, and circuit breaker state. Metrics are aggregated per tenant and per agent, and exposed via the analytics pipeline for dashboard visualization. Teams can use these metrics to identify NLU quality issues and measure the impact of prompt changes or model upgrades.

13. Search AI

13.1 Ingestion

FeatureDescription
Web crawlingURL discovery and content extraction pipeline that crawls websites with configurable depth, URL pattern filtering, and SSRF protection against internal network scanning. Crawled HTML is cleaned with Readability (noise removal), uploaded to S3 for durable storage, and content-deduplicated via hash comparison to avoid reprocessing unchanged pages. The crawler tracks source status and enqueues extraction jobs through BullMQ for asynchronous processing.
Document processingMultimodal document extraction via the Docling service (port 8080) that processes PDFs, Word documents, PowerPoint, images, and HTML to extract text, images, tables, and structural metadata. Docling handles complex document layouts including multi-column text, embedded tables, and scanned images (via OCR). Extracted content is normalized into a uniform text representation suitable for chunking and embedding generation.
Chunking strategiesConfigurable document chunking that splits extracted content into embedding-sized segments using multiple strategies: fixed-size (character or token count with overlap), semantic (split on topic boundaries detected by embedding similarity), and recursive (hierarchically split by heading, paragraph, sentence). The optimal strategy depends on document type and search use case, and can be configured per source or per index.
Structured data ingestionSQL schema discovery and table-level ingestion for relational databases (PostgreSQL). The ingestion pipeline connects to the database, discovers available schemas and tables, and ingests rows as searchable documents with column metadata preserved. This enables SearchAI to answer questions about structured data by combining text search over row descriptions with the original column values for precise lookups.
Connector-based ingestionPull content from enterprise data sources (SharePoint document libraries, Google Drive folders, Amazon S3 buckets, etc.) using the platform’s connector framework. Connector-based ingestion leverages the same OAuth credentials and delta sync infrastructure as the connector tool system, ensuring consistent authentication and incremental updates. Ingested documents flow through the same extraction, chunking, and embedding pipeline as web-crawled content.
FeatureDescription
OpenSearch integrationVector database powered by OpenSearch that stores document embeddings alongside full-text keyword indexes. OpenSearch provides both k-NN vector search (for semantic similarity) and BM25 keyword search (for exact term matching) in a single query engine. Documents are indexed with metadata (source, permissions, chunk position) that enables filtered search and result attribution back to source documents.
Hybrid searchCombined vector similarity search and keyword search with reciprocal rank fusion (RRF) or weighted score merging to produce a single ranked result list. Hybrid search captures both semantic meaning (via embeddings) and exact lexical matches (via keywords), outperforming either approach alone. The fusion weights are configurable per index, allowing teams to tune the balance between semantic and keyword relevance for their specific domain and content type.
BGE-M3 embeddingsMultilingual embedding model (BGE-M3, served on port 8000) that generates dense vector representations for documents and queries in 100+ languages. BGE-M3 supports multi-granularity embeddings (dense, sparse, and multi-vector) for flexible retrieval strategies. The embedding service processes documents during ingestion and queries at search time, ensuring consistent vector representations. Multilingual support enables cross-language search where queries in one language match documents in another.
Permission-filtered searchSearch results are filtered at query time to respect the source document permissions crawled during ingestion. The search engine intersects the requesting user’s identity with the permission ACLs stored per document, returning only results the user is authorized to access in the source system. This enables secure enterprise search over sensitive content (SharePoint, Google Drive) without duplicating the source system’s access control logic.
Knowledge graphEntity-relationship graph that captures structured relationships between entities extracted from ingested documents (people, organizations, products, concepts). The knowledge graph enables structured knowledge retrieval that complements vector search — for example, finding all products related to a specific technology or all people in a specific department. Graph queries can be combined with text search for multi-hop question answering.

13.3 Knowledge Base Management

FeatureDescription
Domain vocabularyCustom domain-specific terms, acronyms, and definitions that enhance search quality by teaching the system domain jargon. Vocabulary entries are stored per search index and used during query expansion (adding synonyms), document indexing (recognizing domain terms), and answer generation (using correct terminology). The vocabulary can be populated manually or auto-generated from ingested documents using LLM-assisted extraction.
Taxonomy generationLLM-assisted taxonomy creation that analyzes ingested documents to automatically generate a hierarchical category structure for the knowledge base. The generated taxonomy organizes documents into topics and subtopics, enabling faceted search and content browsing. Teams can review, edit, and approve the generated taxonomy through the Studio UI before it is applied to the search index, combining AI efficiency with human oversight.
Org profile schemaDomain customization schema that captures organization-specific information (industry, products, services, terminology, audience) to improve search relevance and answer generation quality. The org profile is provided as context to the LLM during query answering, helping it frame responses appropriately for the organization’s domain. This ensures that search answers use the right terminology and context even when the query is ambiguous.
Source managementTrack and manage all crawl and ingestion sources with per-source configuration (URL patterns, crawl depth, sync schedule, authentication, chunking strategy). The source management UI displays each source’s status (active, paused, error), last sync timestamp, document count, and crawl history. Sources can be individually triggered for re-crawl, paused, or deleted with cascade cleanup of their indexed documents.

13.4 Pipeline Management

FeatureDescription
Crawl historyComplete audit log of all crawl operations including start time, end time, documents processed, errors encountered, and final status. The history is queryable per source and per index, enabling teams to track ingestion trends, identify recurring failures, and verify that scheduled syncs are executing as expected. Each crawl entry links to detailed status logs for individual documents processed during the crawl.
Progress trackingReal-time crawl progress tracking via WebSocket-based status updates (using same-origin WebSocket URL) that show the current crawl stage (discovery, extraction, chunking, embedding, indexing), documents processed versus total, and estimated time remaining. The progress UI displays a live progress bar in the Studio dashboard, enabling teams to monitor long-running ingestion jobs without refreshing the page.
Batch operationsBulk crawl scheduling and management for triggering multiple sources simultaneously, pausing/resuming all active crawls, and scheduling coordinated re-indexing across an entire search index. Batch operations are dispatched through BullMQ with configurable concurrency limits per tenant to prevent resource exhaustion. This enables efficient management of large knowledge bases with dozens of sources that need regular synchronization.
BullMQ job queueAsynchronous job processing for heavy ingestion workloads using BullMQ with Redis-backed queues. The pipeline uses separate queues for different stages (crawling, extraction, chunking, embedding, indexing) with configurable concurrency, priority, and retry settings per queue. Failed jobs are automatically retried with exponential backoff, and dead-letter queues capture permanently failed jobs for manual investigation.
Rate limitingPer-tenant API rate limits for SearchAI endpoints that prevent any single tenant from consuming excessive search or ingestion resources. Rate limits are enforced via the sliding window algorithm in the runtime rate limiter middleware, with configurable limits per plan tier (FREE, TEAM, BUSINESS, ENTERPRISE). Rate limit headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) are included in API responses for client-side handling.

14. Analytics & Insights

14.1 Conversation Analytics

FeatureDescription
Session metricsPer-session analytics including message count, conversation duration, resolution status (completed, abandoned, escalated), first response time, and turn count. Metrics are computed from trace events and session state, aggregated into daily rollups, and stored in ClickHouse for efficient querying. Dashboard visualizations show trends over time, distribution histograms, and top/bottom performers, enabling teams to identify conversations that need attention.
Intent distributionBreakdown of detected intents across all conversations for a project or agent, showing frequency, confidence distributions, and trends over time. The intent distribution dashboard helps teams identify the most common user needs, discover emerging intents not yet covered by agent definitions, and detect intent confusion where similar intents are frequently misclassified. This data drives prioritization of agent improvements and knowledge base updates.
Channel distributionConversation volume breakdown by channel (web chat, WhatsApp, SMS, email, Slack, voice, HTTP async) with trend lines and comparative metrics. Channel distribution reveals where users prefer to interact, which channels have the highest resolution rates, and where channel-specific issues exist. Teams can use this data to prioritize channel-specific optimizations and allocate resources to the highest-traffic channels.
Agent performancePer-agent performance metrics including success rate (conversations resolved without escalation), handoff frequency (how often the agent transfers to another agent or human), average conversation length, and average response latency. Performance metrics are comparable across agents and across time periods, enabling A/B testing of agent improvements. The dashboard highlights underperforming agents and correlates performance changes with deployment events.

14.2 Voice Analytics

FeatureDescription
Call duration metricsTrack voice call duration with distribution analysis showing average, median, p95, and p99 call lengths. Duration metrics are broken down by agent, call disposition (completed, abandoned, transferred), and time of day. This helps teams identify unexpectedly long calls (indicating agent confusion or stuck flows), optimize voice agent prompts for conciseness, and plan capacity based on average call handling time.
Transfer ratesMonitor the frequency and patterns of agent-to-agent voice transfers, including which agents transfer most often, which target agents receive the most transfers, and the average time-to-transfer. High transfer rates may indicate that the entry agent’s routing is not matching user intent effectively, or that the agent lacks the knowledge to resolve certain query types. Transfer analytics help teams optimize multi-agent voice topologies.
Barge-in frequencyTrack how often users interrupt the agent during speech (barge-in events), which indicates user frustration, impatience, or that the agent’s responses are too verbose. Barge-in metrics are correlated with conversation outcomes to determine whether frequent interruptions correlate with lower satisfaction or higher abandonment rates. This data guides optimization of response length and speaking pace for voice agents.
DTMF input trackingMonitor numeric keypad (DTMF) input usage in voice conversations, tracking which menu options are selected most frequently, how often users resort to keypad input versus voice, and the error rate for DTMF-driven flows. This helps teams optimize IVR menu structures, identify dead-end menu paths, and determine the optimal balance between voice-driven and keypad-driven interaction for their user base.
Realtime session monitoringLive dashboard showing active voice sessions with real-time status (ringing, in-progress, on-hold, transferring), current agent, call duration, and caller information. The monitoring view enables supervisors to observe ongoing conversations, identify calls that need human intervention, and track overall voice system utilization. Alert thresholds can be configured for long-running calls or high queue wait times.

14.3 LLM Usage & Cost

FeatureDescription
Token countingPer-call and aggregate token usage tracking broken down by model, recording input tokens and output tokens separately for accurate cost attribution. Token counts are captured in trace events for every LLM call and aggregated up the span hierarchy (call, agent, session, project, tenant). Historical token usage data is available for trend analysis and capacity planning, helping teams forecast LLM costs as usage grows.
Cost calculationLLM spend tracking that multiplies token counts by the per-token pricing defined in the model registry (separate input and output rates). Costs are calculated in real-time per trace span and aggregated into daily, weekly, and monthly totals. The cost calculation accounts for provider-specific pricing tiers and volume discounts when configured. This gives teams precise visibility into their LLM spend at every level of granularity.
Daily usage aggregationDaily rollup of token usage, cost, and session counts aggregated by tenant, project, agent, and model. Aggregations are computed as background jobs and stored in ClickHouse for fast analytical queries. Daily rollups enable trend analysis, budget tracking, and anomaly detection (unexpected usage spikes) without querying raw trace event data. Aggregated data is retained for the tenant’s configured retention period.
Cost breakdownDetailed cost breakdown by provider (Anthropic, OpenAI, Azure, Google), model (GPT-4, Claude 3.5 Sonnet, etc.), and operation type (conversation, NLU analysis, evaluation, guardrail check). The breakdown dashboard shows relative spend across dimensions with drill-down capability, helping teams identify the most expensive operations and evaluate the ROI of model selection decisions. Cost allocation by project enables internal chargeback for multi-team deployments.
Usage dashboardsClickHouse-backed interactive dashboards that visualize token usage, cost trends, session volumes, and model utilization over configurable time ranges. Dashboards support filtering by tenant, project, agent, model, and channel with drill-down from summary charts to detailed per-session data. The ClickHouse columnar storage engine enables sub-second query responses even over millions of trace events, providing a responsive analytics experience.

14.4 Pipeline Analytics

FeatureDescription
Eval pipeline metricsTrack evaluation test pass/fail rates, average quality scores, and score distributions over time across evaluation runs. Metrics are visualized as trend lines and bar charts in the analytics dashboard, making it easy to see whether agent quality is improving or degrading with each change. Regression alerts can be configured to notify teams when quality scores drop below a threshold after a new deployment.
Insight generationAutomated insight extraction from conversation patterns using the analytics pipeline to identify recurring issues, common user frustrations, frequently asked questions, and conversation flow bottlenecks. Insights are generated by analyzing aggregated conversation data and surfaced as actionable recommendations (e.g., “30% of users ask about return policies but the agent lacks this knowledge”). This bridges the gap between raw analytics data and concrete agent improvement actions.
ClickHouse data warehouseColumnar analytics store (ClickHouse) purpose-built for high-volume trace event data, message storage, and analytics queries. ClickHouse handles millions of events per day with sub-second query latency for aggregation queries, time-series analysis, and ad-hoc exploration. Data is partitioned by tenant and time, with automatic TTL-based cleanup and compression. The dual-write message store writes to both MongoDB (operational) and ClickHouse (analytical) for optimal performance in each use case.

15. User Management

15.1 Authentication

FeatureDescription
Email/password signupStandard user registration with email and password, including password strength validation, duplicate email detection, and automatic email verification trigger. New users are created in a pending state until email verification is completed. The signup flow creates a personal tenant by default and supports invitation-based signup where the user joins an existing tenant with a pre-assigned role.
OAuth2 social loginSocial login integration with Google and Microsoft accounts using OAuth2 authorization code flow with PKCE. Users can sign up or sign in with a single click, and the platform creates or links accounts based on the email address from the OAuth provider’s profile. Social login tokens are exchanged for platform JWTs, maintaining a consistent session model regardless of the authentication method used.
Email verificationVerification token system that sends a unique, time-limited token to the user’s email address during signup or email change. Tokens are cryptographically generated, stored with an expiration timestamp, and validated on a single-use basis (tokens are invalidated after successful verification). Unverified accounts have limited access until verification is completed, encouraging prompt email confirmation.
Password resetSecure password reset flow via email-delivered tokens with configurable expiration. The reset process validates the token, enforces password strength requirements, invalidates all existing sessions for the user (forcing re-authentication), and logs the reset event for audit purposes. Rate limiting prevents brute-force token guessing, and the reset email includes client information (IP, user agent) for security awareness.
Device code flowOAuth2 device authorization grant that enables CLI tools and external clients (like the kore-platform-cli) to authenticate without a browser redirect flow. The client displays a user code and verification URL; the user authenticates in their browser and enters the code. The client polls for completion and receives platform JWTs upon approval. This enables headless environments and developer tooling to authenticate securely.
JWT session managementShort-lived access tokens (JWTs) paired with longer-lived refresh tokens for secure session management. Access tokens contain tenant ID, user ID, and role claims, and are validated on every API request via the unified auth middleware. Refresh tokens enable transparent token renewal without re-authentication, and can be revoked to force session termination. Token rotation ensures that each refresh token can only be used once.
MFA frameworkMulti-factor authentication support framework with pluggable second-factor providers. The framework defines the enrollment, challenge, and verification interfaces needed for MFA flows, with TOTP (time-based one-time password) as the planned first provider. The architecture supports future addition of WebAuthn, SMS codes, and recovery codes. MFA enforcement can be configured at the tenant level for organization-wide security policies.

15.2 User Lifecycle

FeatureDescription
User profile managementEdit user profile fields including display name, email address (with re-verification), avatar image, notification preferences, and timezone. Profile changes are audited and email changes require verification of the new address before taking effect. The profile API supports both self-service updates and admin-initiated changes, with appropriate permission checks for each operation type.
Account suspensionAdmin-controlled account deactivation that immediately revokes all active sessions and prevents the user from authenticating. Suspended accounts retain their data and permissions configuration, allowing reactivation without reconfiguration. Suspension events are logged to the audit trail with the suspending admin’s identity and reason. Suspended users attempting to authenticate receive a clear error message directing them to their organization’s admin.
Workspace invitationsInvite users to join a tenant (workspace) with a specific role assignment (Owner, Admin, Operator, Member, Viewer). Invitations are sent via email with a secure, time-limited acceptance link. Invited users who do not have a platform account are guided through a streamlined signup flow that automatically joins them to the inviting tenant. Pending invitations can be revoked by admins, and invitation history is maintained for audit purposes.
Workspace switchingSwitch between multiple tenant memberships for users who belong to more than one organization. The workspace switcher displays all tenants the user is a member of with their role in each, and issues new JWTs scoped to the selected tenant upon switching. The active workspace persists across browser sessions, and the user’s permissions and visible projects change immediately upon switching to reflect the new tenant context.

15.3 RBAC

FeatureDescription
Tenant-level rolesFive hierarchical tenant roles (OWNER, ADMIN, OPERATOR, MEMBER, VIEWER) control access to tenant-wide resources. OWNER has full administrative control including billing and plan management; ADMIN manages members and configuration; OPERATOR handles production operations; MEMBER builds within allowed projects; VIEWER has read-only access. Role assignments are stored per user-tenant relationship and enforced by permission middleware on every API request.
Project-level rolesPer-project role assignments (admin, developer, tester, viewer) plus tenant-scoped custom project roles provide fine-grained access control within a tenant. admin has full project control, developer edits project resources, tester focuses on verification and analytics, and viewer is read-only. Project roles are independent of tenant roles, except true tenant-wide project authority still comes from workspace OWNER/ADMIN permissions.
Permission guardsThe requirePermission() middleware enforces access control using object:operation pattern strings (for example, project:update, agent:update, tenant:manage_members). Permission checks evaluate the user’s role in the relevant scope (tenant or project) against the required permission, returning 404 (not 403) for unauthorized access to prevent leaking resource existence. Guards are composable and can require multiple permissions for sensitive operations.
Resource-level authorizationPer-resource permission checking that verifies the requesting user has access to the specific resource being requested, beyond role-based checks. This includes verifying that resource.projectId === req.params.projectId and resource.tenantId === req.tenantId to prevent cross-tenant and cross-project data access. Resource-level checks are applied in addition to role-based guards, providing defense-in-depth against authorization bypass vulnerabilities.
IP allowlistingCIDR-based access restriction for administrative routes that limits access to requests originating from approved IP address ranges. Allowlists are configurable per tenant and enforced at the middleware level before route handlers execute. This provides network-layer security for sensitive operations (user management, billing, configuration changes) in enterprise environments where admin access should be restricted to corporate networks or VPNs.

16. Tenant Management

16.1 Tenant Administration

FeatureDescription
Tenant creationCreate new tenants (organizations) with plan assignment, initial owner designation, and default configuration provisioning. Tenant creation sets up the MongoDB namespace, provisions encryption keys (tenant-scoped master key for AES-256-GCM), and applies plan-based resource limits. The creation flow can be triggered by user signup (personal tenant) or by platform admin (enterprise onboarding) with different default configurations per path.
Tenant configurationPer-tenant environment variables, LLM model configurations, and feature flags that control platform behavior for all projects within the tenant. Configuration values are stored encrypted and scoped to the tenant level, with inheritance to project and agent levels where not overridden. Feature flags enable gradual rollout of new capabilities and tenant-specific customizations without code changes.
Tenant member managementInvite users to the tenant, remove existing members, and change role assignments (OWNER, ADMIN, OPERATOR, MEMBER, VIEWER) through the admin settings UI and API. Member management enforces that at least one OWNER exists at all times and requires ADMIN or OWNER role to modify other users’ roles. All membership changes are logged to the audit trail with the acting user’s identity and the specific change made.
Tenant isolationAutomatic tenantId enforcement on all database queries, ensuring that every read and write operation is scoped to the requesting user’s tenant. This is enforced at the database query layer using findOne({_id, tenantId}) patterns (never findById), making cross-tenant data access structurally impossible. Cross-scope access attempts return 404 (not 403) to avoid leaking the existence of resources in other tenants. Tenant isolation is a core platform invariant verified by automated tests.

16.2 Plans & Billing

FeatureDescription
Subscription plansFour subscription tiers (FREE, TEAM, BUSINESS, ENTERPRISE) with progressively higher resource limits, feature access, and support levels. Each plan defines maximum values for concurrent sessions, tokens per minute, API calls per day, number of projects, and number of agents. Plan details including pricing and limits are managed through the admin interface and can be customized for enterprise customers with negotiated terms.
Plan-based limitsEnforced resource limits per subscription plan including maximum concurrent sessions, tokens per minute, API rate limits, storage quotas, and model access restrictions. Limits are checked at the middleware level before request processing, and users receive clear error responses with upgrade guidance when limits are reached. The rate limiter uses a sliding window algorithm that prevents burst abuse while allowing normal usage patterns.
Credit systemCredit-based usage tracking that allocates a credit balance to each tenant based on their plan and any purchased top-ups. Credits are consumed by LLM token usage, session creation, and API calls, with per-operation credit costs configurable in the plan definition. The credit balance is checked before expensive operations, and low-balance alerts notify tenant admins when credits are running low. Top-up purchases can be made through the billing interface.
Deal managementCustom deal configuration for enterprise customers that supports multi-phase pricing (e.g., trial period, ramp-up, full pricing), volume discounts, and overage terms. Deals override the standard plan limits with custom values and can specify expiration dates, auto-renewal terms, and credit allocations per phase. The deal management interface enables sales teams to configure complex pricing structures without engineering involvement.
Billing line itemsItemized billing that tracks every billable event (LLM usage by model, session count, API calls, storage) as individual line items associated with the tenant’s billing period. Line items include timestamp, resource type, quantity, unit cost, and total cost for transparency. Invoice generation aggregates line items into periodic invoices with detailed breakdowns, enabling enterprise customers to verify charges and allocate costs to internal departments.
Usage meteringReal-time metering of token consumption, session creation, and API call volume per tenant with per-minute granularity. Metering data is collected from trace events and aggregated into time-bucketed counters used for rate limit enforcement, credit deduction, and billing. Usage metrics are exposed through the tenant settings dashboard and the admin API, enabling both self-service monitoring and platform-wide usage analysis.

16.3 Config Overrides

FeatureDescription
Per-tenant model overrideOverride the default LLM provider and model selection at the tenant level, directing all agents within the tenant to use a specific model unless further overridden at the project or agent level. This enables enterprise customers to enforce use of Azure-hosted models for compliance, restrict access to specific model families, or route all traffic through a dedicated model deployment with guaranteed capacity.
Per-tenant rate limitsCustom rate limits that override the plan defaults for specific tenants, enabling negotiated capacity for enterprise customers or temporary increases for event-driven traffic spikes. Custom limits can be set for API requests per minute, concurrent sessions, tokens per minute, and webhook delivery rate. Overrides are applied transparently by the rate limiter middleware and are auditable through the admin configuration history.
Feature flagsEnable or disable platform features per tenant using boolean or variant-based feature flags. Flags control access to beta features (new channel types, experimental NLU models), deprecated features (legacy API versions), and tenant-specific customizations. Feature flags are evaluated at runtime from the tenant configuration and can be changed without deployment, enabling gradual rollout and instant rollback of new capabilities.
Hyperparameter overridesCustom LLM hyperparameters (temperature, top_p, max_tokens, frequency_penalty, presence_penalty) set at the tenant level that override the model registry defaults. Tenant overrides apply to all LLM calls within the tenant unless further overridden at the project or agent level. This enables enterprise customers to enforce conservative generation parameters (low temperature for consistency) or customize behavior for their specific use cases.

17. Voice

17.1 Core Voice

FeatureDescription
Browser WebSocket streamingDirect browser-to-server audio streaming via WebSocket that enables voice interactions from any modern web browser without plugins or native apps. The WebSocket carries PCM16 audio frames at 16kHz from the user’s microphone to the server, and returns TTS audio frames from the agent’s response. The connection supports full-duplex communication for natural conversational flow with overlapping speech detection.
Deepgram STTSpeech-to-text transcription via Deepgram’s streaming API using linear16 PCM audio at 16kHz sample rate. Deepgram provides real-time interim transcripts (partial results during speech) and final transcripts (after end-of-speech detection), enabling the agent to begin processing before the user finishes speaking. The integration supports language detection, punctuation, and speaker diarization for multi-speaker scenarios.
ElevenLabs TTSText-to-speech synthesis via ElevenLabs’ streaming API producing mp3_22050_32 audio output. ElevenLabs provides high-quality, natural-sounding voice synthesis with configurable voice selection, speaking rate, and emotional tone. The streaming integration delivers audio chunks as they are generated, enabling the agent to start speaking before the full response is synthesized, reducing perceived response latency.
Barge-in supportUser interruption handling that detects when the user starts speaking while the agent’s TTS audio is still playing. When barge-in is detected, the agent’s audio playback is immediately stopped, the TTS generation is cancelled, and the user’s speech is processed as a new input. This creates a natural conversational experience where users do not have to wait for the agent to finish speaking before responding or correcting.
Connection pre-warmingPre-establish WebSocket connections to STT (Deepgram) and TTS (ElevenLabs) services before the user starts speaking, eliminating the connection setup latency from the first response. Pre-warming is triggered when a voice session is created, so that by the time the user speaks, the audio processing pipeline is fully connected and ready. This reduces first-response latency by 200-500ms compared to on-demand connection establishment.

17.2 Realtime Voice Agents

FeatureDescription
OpenAI Realtime APINative integration with OpenAI’s Realtime API that processes audio directly in the model, bypassing separate STT/TTS services for ultra-low latency voice interactions. The Realtime API supports function calling during voice conversations, enabling the agent to invoke tools while maintaining the audio stream. This provides the lowest latency voice experience by eliminating the audio-to-text-to-LLM-to-text-to-audio pipeline in favor of direct audio-to-audio processing.
Google Gemini LiveReal-time multimodal voice interaction using Google Gemini’s Live API that supports simultaneous audio, text, and visual input. Gemini Live processes audio natively and can reference images or documents shared during the voice session, enabling richer interactions than audio-only models. The integration manages the Gemini Live WebSocket session, handles function calling during voice, and supports seamless fallback to STT/TTS mode if the Gemini Live connection is unavailable.
UltravoxPurpose-built voice agent model designed specifically for low-latency conversational AI with native audio understanding. Ultravox processes speech directly without transcription, preserving paralinguistic cues (tone, emphasis, hesitation) that are lost in STT pipelines. The model supports tool calling and structured output while maintaining sub-second response times, making it ideal for voice-first agent deployments where natural conversation quality is the priority.
VAD (Voice Activity Detection)Server-side voice activity detection that analyzes incoming audio frames to determine speech presence, enabling automatic turn management between the user and agent. VAD distinguishes between speech, silence, and background noise with configurable sensitivity thresholds and minimum silence duration before triggering end-of-speech. The detection integrates with barge-in handling, ensuring that brief pauses within an utterance do not prematurely trigger agent response.
Audio format supportSupport for multiple audio encoding formats including PCM16 (16-bit linear PCM for browser WebSocket), G711 ulaw (for Twilio North American telephony), and G711 alaw (for international telephony). The voice runtime automatically negotiates the appropriate format based on the channel type and transcodes between formats when needed. This ensures compatibility with both browser-based and telephony-based voice channels without requiring client-side format conversion.

17.3 Telephony

FeatureDescription
Twilio media handlerHandles PSTN voice calls via Twilio’s Media Streams API, which connects traditional phone calls to the platform’s voice agent runtime. The handler receives G711-encoded audio from Twilio, processes it through the STT/TTS or realtime LLM pipeline, and returns audio for playback to the caller. It manages Twilio-specific events (call connected, disconnected, hold, transfer) and maps them to platform session lifecycle events.
DTMF supportDual-tone multi-frequency (DTMF) keypad input detection for telephony voice sessions, enabling callers to interact via numeric keypad in addition to speech. DTMF tones are detected by the Twilio media handler and delivered to the agent as structured input events. Agents can use DTMF for menu navigation, PIN entry, account number input, and other scenarios where keypad input is more reliable or convenient than speech recognition.
Call controlHold, transfer, and conference capabilities for telephony voice sessions. Hold pauses the conversation and plays hold music or a hold message. Transfer connects the caller to another agent (warm transfer with context) or to an external phone number (cold transfer). Conference enables multi-party calls where a human supervisor can join an ongoing agent conversation for monitoring or intervention. All call control actions are triggered by agent instructions or supervisor commands.
Voice gatewayPSTN-to-web agent transfer capability that bridges a phone call to a web-based agent experience, maintaining conversation continuity across the channel transition. When activated, the gateway sends the caller a link (via SMS or voice prompt) to continue the conversation in a web browser with richer capabilities (visual elements, file sharing, interactive components). The session context, conversation history, and agent state are preserved during the transfer.

18. Guardrails

18.1 Guardrail Types

FeatureDescription
Regex patterns (Tier 1)Fast regex-based content matching that evaluates agent responses and user inputs against configurable regular expression patterns in microseconds. Tier 1 guardrails execute first in the multi-tier pipeline and handle deterministic pattern matching for known violations (profanity, specific forbidden phrases, format validation). Regex patterns support named capture groups for targeted redaction and can be configured with severity levels that determine the action taken on match.
Keyword matching (Tier 1)Keyword blocklist and allowlist enforcement that checks content against curated word lists for fast, deterministic content filtering. Blocklists flag content containing prohibited terms, while allowlists permit specific terms that might otherwise trigger false positives in broader pattern rules. Keyword matching runs at Tier 1 alongside regex patterns for sub-millisecond evaluation, making it suitable for high-volume traffic without adding latency to the conversation.
CEL expressions (Tier 2)Common Expression Language (CEL) rules that evaluate complex guardrail conditions against the full conversation context including session variables, gather field values, NLU analysis results, and message metadata. CEL expressions enable business logic guardrails (e.g., “block if credit amount exceeds $10,000 and user is not verified”) that require multi-field evaluation. Tier 2 guardrails execute after Tier 1 passes and add minimal latency through in-process expression evaluation.
LLM semantic evaluation (Tier 3)LLM-based guardrail evaluation for nuanced content assessment that cannot be captured by patterns or rules. The LLM judge receives the content, a policy description, and few-shot examples, and returns a structured verdict (pass/fail with reasoning). Tier 3 executes only when Tier 1 and Tier 2 pass, ensuring that expensive LLM calls are reserved for content that requires semantic understanding. LLM judges can be configured with any supported model provider for cost/quality tradeoffs.
PII detection (Built-in)Built-in personally identifiable information detection that identifies names, email addresses, phone numbers, social security numbers, credit card numbers, and other sensitive data patterns in agent responses and user inputs. When PII is detected, the configured action (redact, flag, block) is applied automatically. PII detection runs as part of the trace scrubber for observability and as a guardrail phase for response filtering, providing defense-in-depth against data leakage.
Custom HTTP webhooksExternal guardrail service integration via HTTP POST webhooks that sends content to a customer-hosted evaluation service and receives a pass/fail verdict. This enables organizations to apply proprietary content policies, industry-specific compliance checks, or specialized ML models that are not available as platform-native guardrails. Webhook calls include timeout enforcement and circuit breaker protection to prevent external service failures from blocking conversations.
OpenAI-compatible providersUse any LLM provider that implements the OpenAI-compatible chat completions API as a guardrail evaluation engine, including self-hosted models fine-tuned for content moderation. This enables organizations to use specialized moderation models (e.g., Llama Guard, custom fine-tunes) for guardrail evaluation while leveraging the platform’s multi-tier pipeline infrastructure. The provider is configured per guardrail rule, allowing different rules to use different evaluation models.

18.2 Guardrail Actions

FeatureDescription
BlockCompletely block the agent’s response and return a pre-configured safe message to the user instead. The blocked response is logged to the trace with the triggering guardrail rule for audit purposes but is never shown to the user. Block actions are used for severe violations (hate speech, illegal content, explicit policy breaches) where no modified version of the response would be acceptable.
Modify / RedactSanitize or redact specific portions of the agent’s response while allowing the rest to pass through. Redaction replaces detected PII, profanity, or sensitive data with placeholder markers ([REDACTED], [NAME], [EMAIL]) while preserving the response’s overall meaning and usefulness. Modification actions support custom replacement text per guardrail rule, enabling organizations to substitute branded or policy-compliant language for flagged content.
Flag for reviewAllow the agent’s response to reach the user but flag it for human review in the moderation queue. Flagged messages are tagged with the triggering guardrail rule, severity level, and confidence score, enabling reviewers to prioritize their queue. This action is appropriate for borderline content where blocking would create a poor user experience but the content warrants human oversight for quality assurance or compliance monitoring.
Log and allowLog the guardrail violation to the audit trail and trace events but allow the response through unchanged. This is the least restrictive action, used for monitoring and analytics when teams want to track policy adherence rates without impacting the user experience. Logged violations feed into the analytics pipeline for aggregate reporting on guardrail trigger frequency, enabling teams to identify emerging content patterns that may require stricter enforcement.
Escalate to humanTransfer the conversation from the agent to a human operator when a guardrail violation indicates that the agent cannot safely continue the interaction. Escalation creates a handoff to the human agent queue with full conversation context, the triggering guardrail details, and the blocked response for the human agent’s reference. This ensures that sensitive or complex situations are handled by a human rather than risking further inappropriate agent responses.

18.3 Guardrail Infrastructure

FeatureDescription
Multi-tier pipelineCascading evaluation pipeline that runs guardrails in order of speed and cost: Tier 1 (regex/keyword, microseconds) catches obvious violations instantly, Tier 2 (CEL expressions, milliseconds) evaluates business rules, and Tier 3 (LLM semantic evaluation, seconds) handles nuanced content assessment. Each tier short-circuits on violation, so expensive LLM calls only run when fast checks pass. This design balances thorough content evaluation with minimal latency impact on the conversation.
Circuit breakerFault tolerance for external guardrail providers (LLM services, webhook endpoints) using a circuit breaker state machine (closed/open/half-open). When a guardrail provider fails repeatedly (exceeding the configurable threshold), the circuit opens and the guardrail is bypassed with a logged warning rather than blocking the conversation. After a reset timeout, probe requests test recovery. This prevents guardrail provider outages from making the entire platform unresponsive.
PII vaultSecure encrypted storage for PII instances detected during guardrail evaluation, maintaining a record of what PII was found, where it appeared, and what action was taken. The vault enables organizations to track PII exposure across conversations for compliance reporting and right-to-erasure implementation. Vault entries are encrypted at rest with tenant-scoped keys and subject to configurable retention periods with automatic purging.
PII audit loggingCompliance-grade audit trail for every PII detection event including the PII type (name, email, SSN, etc.), detection location (user input, agent response, tool result), action taken (redact, block, flag), and confidence score. Audit logs are immutable and retained according to the tenant’s compliance policy. This provides the evidence trail needed for privacy regulation compliance (GDPR, CCPA, HIPAA) and internal security audits.
Multi-tenant isolationTenant-scoped guardrail policies that ensure each tenant’s guardrail configuration (rules, keyword lists, LLM judges, severity thresholds) is completely independent. Guardrail rules defined by one tenant are invisible to other tenants, and evaluation results are stored with tenant attribution. This enables each organization to define content policies appropriate for their industry, jurisdiction, and risk tolerance without affecting other tenants on the platform.
Phase-based applicationGuardrails are applied at multiple phases of the agent execution lifecycle: pre-tool (before tool calls execute, to prevent sensitive tool invocations), post-response (before agent responses reach the user, to filter inappropriate content), and pre-handoff (before conversation transfers, to validate handoff appropriateness). Each phase evaluates the relevant guardrail rules from the agent’s IR, enabling fine-grained policy enforcement at every decision point in the conversation.

19. Project & Tenant Settings

19.1 Project Settings

FeatureDescription
Environment variablesPer-project environment variable management with support for environment-specific values (development, staging, production). Variables are accessible to agents at runtime via {{env.VAR_NAME}} template interpolation and tool parameter binding. The settings UI provides a key-value editor with import/export capability, and changes take effect on the next agent execution without requiring redeployment.
Secrets / credentialsEncrypted secret storage for API keys, OAuth tokens, and service credentials using AES-256-GCM encryption at rest with tenant-scoped encryption keys. Secrets are referenced in tool definitions via {{secrets.SECRET_NAME}} placeholders and resolved at runtime — the actual secret value never appears in compiled IR, logs, or trace events. Secret creation, rotation, and deletion are audited, and the UI masks secret values after initial entry for security.
LLM configurationProject-level model selection and hyperparameter configuration that determines which LLM models and settings are available to agents within the project. Configuration inherits from the tenant’s model provisioning and can restrict or extend the available model set. Per-model hyperparameter presets (temperature, max_tokens, top_p) can be customized at the project level, enabling teams to standardize model behavior across their agents.
API keysGenerate and manage project-scoped API keys for programmatic access to the runtime and SearchAI APIs. Each API key is associated with a project and optional permission scope, enabling external systems to invoke agents, query search indexes, and access analytics without user authentication. Keys can be rotated, revoked, and assigned descriptive labels. Usage is tracked per key for monitoring and rate limiting.
Deployment targetsConfigure deployment environments (development, staging, production) with per-environment settings including runtime URL, environment variables, model overrides, and feature flags. Each deployment target represents a distinct execution environment where agents can be deployed and tested. The target configuration drives the deployment workflow’s promotion pipeline, ensuring that agents pass through testing stages before reaching production.
Git integrationConnect the project to a Bitbucket, GitHub, or GitLab repository for bidirectional source synchronization and webhook-triggered deployments. The git integration settings capture the repository URL, branch name, authentication credentials, and sync direction (push, pull, bidirectional). Once connected, changes in the repository trigger automatic agent compilation and deployment, and changes made in the Studio editor can be committed back to the repository.

19.2 Tenant Settings (Admin)

FeatureDescription
Organization settingsTenant-level settings including organization name, logo (displayed in the Studio UI and web widget), domain configuration, and SSO settings (SAML 2.0 provider URL, certificate, attribute mapping). Organization settings define the tenant’s brand identity within the platform and control enterprise authentication flows. Changes to organization settings are audited and require OWNER or ADMIN role.
Plan managementView the current subscription plan details (tier, limits, pricing), compare plan features, and initiate plan changes (upgrade, downgrade). Plan changes take effect at the next billing cycle with prorated credits for mid-cycle changes. The plan management view shows current resource usage against plan limits, highlighting any limits that are approaching or exceeded. Enterprise customers with custom deals see their negotiated terms instead of standard plan pricing.
Member managementInvite, remove, and change roles for team members within the tenant through a dedicated settings page. The member list shows each user’s name, email, role, last active date, and invitation status. Bulk operations support inviting multiple users simultaneously with role assignment. Member management enforces the constraint that at least one OWNER must exist and provides clear feedback when role changes would violate this constraint.
Config overridesOverride platform default configuration values at the tenant level, including rate limits, feature flags, model defaults, and behavioral settings. Overrides take precedence over platform defaults but can themselves be overridden at the project or agent level. The override interface shows the current effective value, the platform default, and any active override with its source, making the configuration inheritance chain transparent.
Model provisioningAssign and configure LLM models for the tenant by adding provider credentials, selecting available models, and setting default hyperparameters. Provisioning creates the tenant-to-model mappings that determine which models appear in project and agent configuration dropdowns. Multiple credentials can be provisioned per model for failover, and credential health checks verify connectivity before activation.
Usage monitoringView comprehensive usage metrics including token consumption by model, session counts by channel, API call volumes, and storage utilization. The monitoring dashboard shows current-period usage against plan limits with trend charts, daily breakdowns, and per-project attribution. Configurable alerts notify tenant admins when usage approaches plan limits, enabling proactive capacity management.
Audit logView a chronological log of all administrative actions within the tenant including user management changes, configuration modifications, deployment events, and security-sensitive operations. Each audit entry records the acting user, timestamp, action type, affected resource, and before/after values for mutations. The audit log is immutable, searchable, and exportable for compliance reporting and security investigations.

20. Auth Profiles

20.1 Supported Auth Types

FeatureDescription
OAuth2Full OAuth2 authorization code flow implementation with PKCE (Proof Key for Code Exchange) for public clients, automatic token refresh with retry logic, and scope management. The auth profile stores client ID, client secret, authorization URL, token URL, and requested scopes. Token refresh is handled transparently by the credential resolver before each tool call, ensuring agents always use valid tokens without manual intervention.
API KeyAuthentication using API keys transmitted as Bearer tokens in the Authorization header or as X-API-Key custom headers. The auth profile stores the key value encrypted at rest and injects it into every HTTP tool call that references the profile. API key profiles support key rotation by updating the stored value without modifying tool definitions, and the impact analysis feature identifies which tools and agents are affected by a credential change.
Basic AuthHTTP Basic Authentication that sends base64-encoded username:password credentials in the Authorization header. The auth profile stores the username and password encrypted at rest and constructs the Basic header value at request time. While simple, Basic Auth remains common for internal APIs and legacy systems, and the profile ensures credentials are managed centrally rather than scattered across tool definitions.
SAML 2.0Enterprise SSO authentication via SAML 2.0 assertions for tools that require SAML-based authentication to access enterprise services. The auth profile manages the SAML identity provider configuration, assertion parsing, and credential extraction. This enables agents to access SAML-protected enterprise APIs and services using the organization’s existing identity infrastructure without requiring separate service account credentials.
KerberosWindows/Active Directory authentication using the Kerberos protocol for agents that need to access Kerberos-protected enterprise resources (SharePoint on-premises, SQL Server, internal web services). The auth profile manages the Kerberos ticket acquisition and renewal process, enabling agents to authenticate against Active Directory-integrated services using the organization’s domain credentials.
WS-SecuritySOAP-based web service authentication using WS-Security headers for agents that interact with legacy enterprise SOAP APIs. The auth profile handles security token generation, timestamp insertion, and message signing as required by the target service’s WS-Security policy. This enables agents to access older enterprise systems that have not migrated to REST APIs while maintaining proper authentication and message integrity.
HawkHTTP MAC-based authentication using the Hawk protocol, which provides request authentication via a shared secret and cryptographic message authentication code. The auth profile stores the Hawk credentials (key ID, key, algorithm) and generates the Authorization header with timestamp, nonce, and MAC for each request. Hawk provides replay protection and request integrity verification, making it suitable for APIs that require stronger authentication than simple API keys.
DigestHTTP Digest authentication that provides password-based authentication without transmitting the password in cleartext. The auth profile handles the challenge-response handshake automatically, computing the digest hash from the stored credentials and the server’s nonce. Digest authentication is supported for legacy systems that require it, with the profile abstracting the multi-step authentication exchange from the tool execution flow.
Custom headersArbitrary HTTP header-based authentication that allows teams to define custom header name-value pairs for tool authentication. The auth profile stores one or more custom headers (e.g., X-Custom-Auth, X-Tenant-Token) that are injected into every HTTP request for tools bound to this profile. This catch-all auth type accommodates proprietary authentication schemes that do not fit standard patterns, ensuring any HTTP API can be integrated.
JWTJSON Web Token authentication that generates or validates JWTs for tool calls to JWT-protected APIs. The auth profile can be configured to generate JWTs (signing with a stored private key, configurable claims, and expiration) or to validate incoming JWTs (verifying signatures against a stored public key or JWKS endpoint). This enables agents to access JWT-secured microservices using the platform’s centralized key management.

20.2 Credential Management

FeatureDescription
Encrypted storageTenant-scoped credential material is encrypted at rest using DEK envelope encryption with provider-tracked KEK wrapping. Credential values are decrypted only at the moment of use and never appear in logs, traces, or API responses.
Token refreshAutomatic OAuth2 token refresh that detects expiring or expired access tokens before tool execution and transparently refreshes them using the stored refresh token. The refresh process includes retry with exponential backoff for transient failures and circuit breaker protection for persistently failing token endpoints. Refreshed tokens are immediately persisted to the encrypted credential store. If refresh fails after retries, the error is propagated to the agent for graceful handling.
Multi-provider supportMultiple auth profiles can be created per project, each targeting a different service or authentication provider. Tools within the same project can bind to different auth profiles, enabling a single agent to authenticate against multiple external services (e.g., Salesforce with OAuth2, internal API with API key, legacy system with Basic Auth) within the same conversation. Profile selection is configured per tool at the tool definition level.
Credential resolutionMulti-level credential lookup that resolves auth profiles through a fallback chain: tenant-level override, project-level configuration, and platform default. The resolver returns the first valid credential found in the chain, enabling centralized credential management where most tools use tenant-wide credentials while specific tools can override with project-specific ones. Resolution results are cached with TTLs to minimize database lookups during high-frequency tool calls.
Impact analysisAnalyze which tools, agents, and deployments are affected when a credential is changed, rotated, or revoked. The impact analysis traverses tool definitions that reference the auth profile, identifies agents that use those tools, and lists active deployments running those agents. This enables teams to assess the blast radius of credential changes before applying them, preventing unexpected outages from credential rotation in production environments.

20.3 Tool Integration

FeatureDescription
Auth profile bindingBind auth profiles to HTTP tools at the tool definition level so that authentication headers are automatically injected into every API request the tool makes. The binding is specified by auth profile name in the tool definition and resolved at runtime through the credential resolution chain. Handoff and delegation auth profile propagation ensures that tools maintain their authentication context when invoked by child agents in multi-agent topologies.
Connector credentialsNango-managed OAuth credentials for connector authentication that handles the full OAuth lifecycle (authorization, token exchange, refresh, revocation) for 100+ connector providers. Connector credentials are stored as Nango connections linked to project-scoped auth profiles, with Nango handling provider-specific quirks (different refresh token behaviors, scope formats, token endpoints). This abstraction means connector tools always receive valid credentials without custom OAuth code per provider.
Per-tool credential mappingDifferent auth profiles can be assigned to different tools within the same project, enabling an agent to authenticate against multiple external services with distinct credentials in a single conversation. For example, one tool might use an OAuth2 profile for Salesforce, another might use an API key for an internal service, and a third might use Basic Auth for a legacy system. The credential mapping is configured declaratively in the tool definition and resolved transparently at execution time.

Summary

CategoryFeaturesSub-categories
ABL Language24Core Constructs, Parser & Compilation
Agent Anatomy13Agent Types, Agent Structure
Multi-Agent Orchestration21Handoff, Delegation, Supervisor, A2A, Parallel
Memory Management15Session, Persistent, Contact, Message
Tool Calling17Tool Types, Execution, Middleware
Agent Development20Editor, Project, Deployment, Architect, Import/Export
Agent Testing & Evals11Framework, Pipelines, UI
Agent Observability13Trace System, Dashboard, Debug Protocol
Model Hub17Providers, Management, Features, Realtime Voice
Channels12Messaging, Voice, Infrastructure
Integrations33Framework, Connectors (25+), Sync
Agent NLU17Tasks, Engine, Enterprise
Search AI16Ingestion, Indexing, Knowledge Base, Pipeline
Analytics & Insights14Conversation, Voice, LLM Cost, Pipeline
User Management13Authentication, Lifecycle, RBAC
Tenant Management13Admin, Plans & Billing, Config Overrides
Voice12Core, Realtime Agents, Telephony
Guardrails16Types, Actions, Infrastructure
Project & Tenant Settings13Project Settings, Tenant Settings
Auth Profiles15Auth Types, Credentials, Tool Integration
Total~315