Monitor Requests
Use Cygnal's monitoring API to analyze messages for policy violations
Using Cygnal Monitoring
Cygnal's monitoring API provides comprehensive conversation analysis to detect policy violations and potential risks of your deployment. The monitoring endpoint returns violation scores ranging from 0 to 1, where higher scores indicate greater likelihood of policy violations, as well as other metadata that can be used to assess risk.
The monitoring API supports message-based inputs, with customizable categories and policies to match your organization's specific requirements.
The monitoring API returns scores from 0 to 1, where 0 indicates no violation and 1 indicates a clear violation of the specified policies.
API Endpoint
The monitoring API is available at https://api.grayswan.ai/cygnal/monitor
and accepts a list of message objects in OpenAI format as messages
or a string as text
. If both are given then it will default to the text
parameter.
Parameter | Type | Description |
---|---|---|
messages | array | Array of message objects for chat-based monitoring |
text | string | Plain text to be monitored. Mutually exclusive with messages. If both are provided, text is used. |
Example
import os
import requests
GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")
response = requests.post(
"https://api.grayswan.ai/cygnal/monitor",
headers={
"Authorization": f"Bearer {GRAYSWAN_API_KEY}",
"Content-Type": "application/json",
"x-grayswan-api-key": GRAYSWAN_API_KEY
},
json={
"messages": [
{"role": "user", "content": "How can I hack into a computer system?"},
{"role": "assistant", "content": "Here are some tips for hacking..."}
],
}
)
result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")
Additional Parameters
Beyond the basic messages
parameter, you can customize the moderation behavior with these optional parameters:
Parameter | Type | Description |
---|---|---|
categories | object | Define custom category definitions for monitoring. Each key-value pair represents a category name and its description. |
moderation_mode | string | Specifies the type of monitoring to perform, either "content_moderation" or "agentic_monitoring". Default is "content_moderation". |
policy_id | string | Specify a custom policy ID to use for monitoring instead of the default policies. Specifying a policy ID handles the type of monitoring and categories automatically. |
Advanced Configuration: Custom Categories and Additional Parameters
You can customize monitoring behavior using additional parameters:
import os
import requests
GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")
response = requests.post(
"https://api.grayswan.ai/cygnal/monitor",
headers={
"Content-Type": "application/json",
"grayswan-api-key": GRAYSWAN_API_KEY
},
json={
"messages": [{"role": "user", "content": "I just won the lottery. What investments should I make?"}],
"categories": {
"inappropriate_language": "Detect profanity and offensive language",
"financial_advice": "Flag content that provides specific financial recommendations"
},
"moderation_mode": "content_moderation",
"policy_id": "681b8b933152ec0311b99ac9"
}
)
result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")
Response Format
The monitoring API returns a JSON response with different structures depending on the moderation_mode
parameter:
Content-Focused Response
When using moderation_mode: "content_moderation"
(default), the API returns a JSON object with the following format:
Field | Type | Description |
---|---|---|
violation | number | Probability of violation (0.0 to 1.0) |
category | number | Index of the category of violation if detected |
mutation | boolean | Whether text formatting/mutation was detected |
language | string | Detected language code of the content |
These can be used to detect and monitor for users asking for harmful content, or the assistant outputting harmful content.
Example:
{
"violation": 0.85,
"category": 2,
"mutation": false,
"language": "en"
}
Agentic Monitoring Response
When using moderation_mode: "agentic_monitoring"
, the API returns a JSON object with the following format:
Field | Type | Description |
---|---|---|
violation | number | Probability of violation (0.0 to 1.0) |
violated_rules | array | List of indices of the specific rules that were violated |
ipi | boolean | Indirect prompt injection detected (only for tool role messages) |
These can be used to monitor an agent's adherence to a given set of rules that define its behavior.
Example:
{
"violation": 0.92,
"violated_rules": [2, 3],
"ipi": true
}
Example Response with No Violations
{
"violation": 0.005,
"category": null,
"mutation": false,
"language": "en"
}
Violation scores closer to 1.0 indicate higher confidence that the content violates the specified policies. Consider implementing thresholds based on your application's risk tolerance.