Monitor Requests

Using Cygnal Monitoring

Cygnal's monitoring API provides comprehensive conversation analysis to detect policy violations and potential risks of your deployment. The monitoring endpoint returns violation scores ranging from 0 to 1, where higher scores indicate greater likelihood of policy violations, as well as other metadata that can be used to assess risk.

The monitoring API supports message-based inputs, with customizable rules and policies to match your organization's specific requirements.

The monitoring API returns scores from 0 to 1, where 0 indicates no violation and 1 indicates a clear violation of the specified policies.

API Endpoint

The monitoring API is available at https://api.grayswan.ai/cygnal/monitor and accepts a list of message objects in OpenAI format as messages or a string as text. If both are given then it will default to the text parameter.

Parameter	Type	Description
`messages`	array	Array of message objects for chat-based monitoring
`text`	string	Plain text to be monitored. Mutually exclusive with messages. If both are provided, text is used.

Example

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Authorization": f"Bearer {GRAYSWAN_API_KEY}",
        "Content-Type": "application/json",
        "grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [
            {"role": "user", "content": "How can I hack into a computer system?"},
            {"role": "assistant", "content": "Here are some tips for hacking..."}
        ],
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Additional Parameters

Beyond the basic text or messages parameter, you can customize the moderation behavior with these optional parameters:

Parameter	Type	Description
`rules`	object	Define custom rule definitions for monitoring. Each key-value pair represents a rule name and its description.
`reasoning_mode`	off (default), hybrid, or thinking	Whether to use reasoning mode for monitoring.
`policy_id`	string	Specify a custom policy ID to use for monitoring instead of the default policies. Specifying a policy ID handles the type of monitoring and rules automatically.
`policy_ids`	array	Specify multiple custom policy IDs to use for monitoring.

If no policy ID is provided then the default Basic Content Safety policy is applied.

If both rules and policy_id are provided, rules defined in the policy are applied first. Policy rules take precedence in the case of duplicate rule names.

Multi-Policy Aggregation

When multiple policy IDs are provided:

Rules from all policies are merged in order
Rules from earlier policies take precedence
Custom rules supplement the merged policy rules

Reasoning mode

reasoning_mode controls whether Cygnal uses internal reasoning steps before determining if content violates policy. These steps are not returned in API responses but can improve detection quality.

off (default): Fastest and lowest-latency. No additional reasoning tokens. Recommended for most production use.
hybrid: Moderate latency increase. The model reasons as needed without a prescribed reasoning style. Good balance for higher-risk contexts.
thinking: Highest latency and token usage. The model performs guided internal reasoning before classification. Use when detection quality matters more than speed (e.g., offline analysis, security reviews).

Using hybrid or thinking increases latency and token usage. If latency is a priority, prefer off.

Example request body with reasoning mode:

{
  "text": "How can I hack into a computer system?",
  "reasoning_mode": "hybrid"
}

Advanced Configuration: Custom Rules and Additional Parameters

You can customize monitoring behavior using additional parameters:

import os
import requests

GRAYSWAN_API_KEY = os.environ.get("GRAYSWAN_API_KEY")

response = requests.post(
    "https://api.grayswan.ai/cygnal/monitor",
    headers={
        "Authorization": f"Bearer {GRAYSWAN_API_KEY}",
        "Content-Type": "application/json",
        "grayswan-api-key": GRAYSWAN_API_KEY
    },
    json={
        "messages": [{"role": "user", "content": "I just won the lottery. What investments should I make?"}],
        "rules": {
            "inappropriate_language": "Detect profanity and offensive language",
            "financial_advice": "Flag content that provides specific financial recommendations"
        },
        "policy_id": "681b8b933152ec0311b99ac9"
    }
)

result = response.json()
violation_score = result["violation"]
print(f"Violation score: {violation_score}")

Response Format

The API returns a JSON object with the following format:

Field	Type	Description
`violation`	number	Probability of violation (0.0 to 1.0)
`violated_rules`	array	List of indices of the specific rules that were violated
`mutation`	boolean	Whether text formatting/mutation was detected
`ipi`	boolean	Indirect prompt injection detected (only for tool role messages)
`violated_rule_descriptions`	array	List of information for each of the specific rules that were violated

These can be used to monitor an agent's adherence to a given set of rules that define its behavior.

Example:

{
  "violation": 0.92,
  "violated_rules": [2, 3],
  "mutation": false,
  "ipi": true,
  "violated_rule_descriptions": [
    { "rule": 2, "name": "Rule Name", "description": "Rule text" },
    { "rule": 3, "name": "Rule Name", "description": "Rule text" }
  ]
}

Example Response with No Violations

{
  "violation": 0.005,
  "violated_rules": [],
  "mutation": false,
  "ipi": false,
  "violated_rule_descriptions": []
}

Violation scores closer to 1.0 indicate higher confidence that the content violates the specified policies. Consider implementing thresholds based on your application's risk tolerance.

On this page