Monitor Requests

Using Cygnal Monitoring

Cygnal's monitoring API provides comprehensive conversation analysis to detect policy violations and potential risks of your deployment. The monitoring endpoint returns violation scores ranging from 0 to 1, where higher scores indicate greater likelihood of policy violations, as well as other metadata that can be used to assess risk.

The monitoring API supports message-based inputs, with customizable categories and policies to match your organization's specific requirements.

The monitoring API returns scores from 0 to 1, where 0 indicates no violation and 1 indicates a clear violation of the specified policies.

API Endpoint

The monitoring API is available at https://api.grayswan.ai/cygnal/monitor and accepts a list of message objects in OpenAI format as messages or a string as text. If both are given then it will default to the text parameter.

Parameter	Type	Description
`messages`	array	Array of message objects for chat-based monitoring
`text`	string	Plain text to be monitored. Mutually exclusive with messages. If both are provided, text is used.

Example

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Authorization": f"Bearer {GRAYSWAN_API_KEY}",
        "Content-Type": "application/json",
        "x-grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [
            {"role": "user", "content": "How can I hack into a computer system?"},
            {"role": "assistant", "content": "Here are some tips for hacking..."}
        ],
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Additional Parameters

Beyond the basic messages parameter, you can customize the moderation behavior with these optional parameters:

Parameter	Type	Description
`categories`	object	Define custom category definitions for monitoring. Each key-value pair represents a category name and its description.
`moderation_mode`	string	Specifies the type of monitoring to perform, either "content_moderation" or "agentic_monitoring". Default is "content_moderation".
`policy_id`	string	Specify a custom policy ID to use for monitoring instead of the default policies. Specifying a policy ID handles the type of monitoring and categories automatically.

Advanced Configuration: Custom Categories and Additional Parameters

You can customize monitoring behavior using additional parameters:

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Content-Type": "application/json",
        "grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [{"role": "user", "content": "I just won the lottery. What investments should I make?"}],
        "categories": {
            "inappropriate_language": "Detect profanity and offensive language",
            "financial_advice": "Flag content that provides specific financial recommendations"
        },
        "moderation_mode": "content_moderation",
        "policy_id": "681b8b933152ec0311b99ac9"
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Response Format

The monitoring API returns a JSON response with different structures depending on the moderation_mode parameter:

Content-Focused Response

When using moderation_mode: "content_moderation" (default), the API returns a JSON object with the following format:

Field	Type	Description
`violation`	number	Probability of violation (0.0 to 1.0)
`category`	number	Index of the category of violation if detected
`mutation`	boolean	Whether text formatting/mutation was detected
`language`	string	Detected language code of the content

These can be used to detect and monitor for users asking for harmful content, or the assistant outputting harmful content.

Example:

{
  "violation": 0.85,
  "category": 2,
  "mutation": false,
  "language": "en"
}

Agentic Monitoring Response

When using moderation_mode: "agentic_monitoring", the API returns a JSON object with the following format:

Field	Type	Description
`violation`	number	Probability of violation (0.0 to 1.0)
`violated_rules`	array	List of indices of the specific rules that were violated
`ipi`	boolean	Indirect prompt injection detected (only for tool role messages)

These can be used to monitor an agent's adherence to a given set of rules that define its behavior.

Example:

{
  "violation": 0.92,
  "violated_rules": [2, 3],
  "ipi": true
}

Example Response with No Violations

{
  "violation": 0.005,
  "category": null,
  "mutation": false,
  "language": "en"
}

Violation scores closer to 1.0 indicate higher confidence that the content violates the specified policies. Consider implementing thresholds based on your application's risk tolerance.

Monitor Requests

On this page