Monitor requests

Monitor Requests

Use Cygnal's monitoring API to analyze messages for policy violations

Using Cygnal Monitoring

Cygnal's monitoring API provides comprehensive conversation analysis to detect policy violations and potential risks of your deployment. The monitoring endpoint returns violation scores ranging from 0 to 1, where higher scores indicate greater likelihood of policy violations, as well as other metadata that can be used to assess risk.

The monitoring API supports message-based inputs, with customizable categories and policies to match your organization's specific requirements.

The monitoring API returns scores from 0 to 1, where 0 indicates no violation and 1 indicates a clear violation of the specified policies.


API Endpoint

The monitoring API is available at https://api.grayswan.ai/cygnal/monitor and accepts a list of message objects in OpenAI format as messages or a string as text. If both are given then it will default to the text parameter.

ParameterTypeDescription
messagesarrayArray of message objects for chat-based monitoring
textstringPlain text to be monitored. Mutually exclusive with messages. If both are provided, text is used.

Example

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Authorization": f"Bearer {GRAYSWAN_API_KEY}",
        "Content-Type": "application/json",
        "x-grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [
            {"role": "user", "content": "How can I hack into a computer system?"},
            {"role": "assistant", "content": "Here are some tips for hacking..."}
        ],
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Additional Parameters

Beyond the basic messages parameter, you can customize the moderation behavior with these optional parameters:

ParameterTypeDescription
categoriesobjectDefine custom category definitions for monitoring. Each key-value pair represents a category name and its description.
moderation_modestringSpecifies the type of monitoring to perform, either "content_moderation" or "agentic_monitoring". Default is "content_moderation".
policy_idstringSpecify a custom policy ID to use for monitoring instead of the default policies. Specifying a policy ID handles the type of monitoring and categories automatically.

Advanced Configuration: Custom Categories and Additional Parameters

You can customize monitoring behavior using additional parameters:

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Content-Type": "application/json",
        "grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [{"role": "user", "content": "I just won the lottery. What investments should I make?"}],
        "categories": {
            "inappropriate_language": "Detect profanity and offensive language",
            "financial_advice": "Flag content that provides specific financial recommendations"
        },
        "moderation_mode": "content_moderation",
        "policy_id": "681b8b933152ec0311b99ac9"
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Response Format

The monitoring API returns a JSON response with different structures depending on the moderation_mode parameter:

Content-Focused Response

When using moderation_mode: "content_moderation" (default), the API returns a JSON object with the following format:

FieldTypeDescription
violationnumberProbability of violation (0.0 to 1.0)
categorynumberIndex of the category of violation if detected
mutationbooleanWhether text formatting/mutation was detected
languagestringDetected language code of the content

These can be used to detect and monitor for users asking for harmful content, or the assistant outputting harmful content.

Example:

{
  "violation": 0.85,
  "category": 2,
  "mutation": false,
  "language": "en"
}

Agentic Monitoring Response

When using moderation_mode: "agentic_monitoring", the API returns a JSON object with the following format:

FieldTypeDescription
violationnumberProbability of violation (0.0 to 1.0)
violated_rulesarrayList of indices of the specific rules that were violated
ipibooleanIndirect prompt injection detected (only for tool role messages)

These can be used to monitor an agent's adherence to a given set of rules that define its behavior.

Example:

{
  "violation": 0.92,
  "violated_rules": [2, 3],
  "ipi": true
}

Example Response with No Violations

{
  "violation": 0.005,
  "category": null,
  "mutation": false,
  "language": "en"
}

Violation scores closer to 1.0 indicate higher confidence that the content violates the specified policies. Consider implementing thresholds based on your application's risk tolerance.