Monitor Requests
Use Cygnal's monitoring API to analyze messages for policy violations
Using Cygnal Monitoring
Cygnal's monitoring API provides comprehensive conversation analysis to detect policy violations and potential risks of your deployment. The monitoring endpoint returns violation scores ranging from 0 to 1, where higher scores indicate greater likelihood of policy violations, as well as other metadata that can be used to assess risk.
The monitoring API supports message-based inputs, with customizable categories and policies to match your organization's specific requirements.
The monitoring API returns scores from 0 to 1, where 0 indicates no violation and 1 indicates a clear violation of the specified policies.
API Endpoint
The monitoring API is available at https://api.grayswan.ai/cygnal/monitor and accepts a list of message objects in OpenAI format as messages or a string as text. If both are given then it will default to the text parameter.
| Parameter | Type | Description |
|---|---|---|
messages | array | Array of message objects for chat-based monitoring |
text | string | Plain text to be monitored. Mutually exclusive with messages. If both are provided, text is used. |
Example
import os
import requests
GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")
response = requests.post(
"https://api.grayswan.ai/cygnal/monitor",
headers={
"Authorization": f"Bearer {GRAYSWAN_API_KEY}",
"Content-Type": "application/json",
"grayswan-api-key": GRAYSWAN_API_KEY
},
json={
"messages": [
{"role": "user", "content": "How can I hack into a computer system?"},
{"role": "assistant", "content": "Here are some tips for hacking..."}
],
}
)
result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")Additional Parameters
Beyond the basic text or messages parameter, you can customize the moderation behavior with these optional parameters:
| Parameter | Type | Description |
|---|---|---|
categories | object | Define custom category definitions for monitoring. Each key-value pair represents a category name and its description. |
reasoning_mode | off (default), hybrid, or thinking | Whether to use reasoning mode for monitoring. |
policy_id | string | Specify a custom policy ID to use for monitoring instead of the default policies. Specifying a policy ID handles the type of monitoring and categories automatically. |
Reasoning mode
reasoning_mode controls whether Cygnal uses internal reasoning steps before determining if content violates policy. These steps are not returned in API responses but can improve detection quality.
- off (default): Fastest and lowest-latency. No additional reasoning tokens. Recommended for most production use.
- hybrid: Moderate latency increase. The model reasons as needed without a prescribed reasoning style. Good balance for higher-risk contexts.
- thinking: Highest latency and token usage. The model performs guided internal reasoning before classification. Use when detection quality matters more than speed (e.g., offline analysis, security reviews).
Using hybrid or thinking increases latency and token usage. If latency is
a priority, prefer off.
Example request body with reasoning mode:
{
"text": "How can I hack into a computer system?",
"reasoning_mode": "hybrid"
}Advanced Configuration: Custom Categories and Additional Parameters
You can customize monitoring behavior using additional parameters:
import os
import requests
GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")
response = requests.post(
"https://api.grayswan.ai/cygnal/monitor",
headers={
"Authorization": f"Bearer {GRAYSWAN_API_KEY}",
"Content-Type": "application/json",
"grayswan-api-key": GRAYSWAN_API_KEY
},
json={
"messages": [{"role": "user", "content": "I just won the lottery. What investments should I make?"}],
"categories": {
"inappropriate_language": "Detect profanity and offensive language",
"financial_advice": "Flag content that provides specific financial recommendations"
},
"policy_id": "681b8b933152ec0311b99ac9"
}
)
result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")Response Format
The API returns a JSON object with the following format:
| Field | Type | Description |
|---|---|---|
violation | number | Probability of violation (0.0 to 1.0) |
violated_rules | array | List of indices of the specific rules that were violated |
mutation | boolean | Whether text formatting/mutation was detected |
ipi | boolean | Indirect prompt injection detected (only for tool role messages) |
These can be used to monitor an agent's adherence to a given set of rules that define its behavior.
Example:
{
"violation": 0.92,
"violated_rules": [2, 3],
"mutation": false,
"ipi": true
}Example Response with No Violations
{
"violation": 0.005,
"violated_rules": [],
"mutation": false,
"ipi": false
}Violation scores closer to 1.0 indicate higher confidence that the content violates the specified policies. Consider implementing thresholds based on your application's risk tolerance.