Configure Guardrails in AI+ Studio
Updated
Guardrails in AI+ Studio help you enforce responsible AI usage by restricting the generation or processing of harmful, inappropriate, or policy-violating content. You can configure a Guardrail to monitor and restrict specific types of content generated or submitted through generative AI models.
This configuration guide walks you through the step-by-step process of creating and customizing Guardrails, including setting up basic details, selecting harmful content categories, and defining word filters.
Prerequisites
Before configuring Guardrails in AI+ Studio, ensure that the following permissions and features are enabled in your Sprinklr environment:
AI+ Studio must be enabled in your Sprinklr instance.
You must have the following permissions based on your role:
View permission for AI+ Studio and Guardrails.
Edit permission for AI+ Studio and Guardrails.
Delete permission for Guardrails (optional, if deletion is required)
The following table outlines how each permission affects your access to Guardrails:
Permission | Yes (Granted) | No (Not Granted) |
View |
|
|
Edit |
|
• Add Guardrail CTA is not visible. |
Delete |
|
|
Access Guardrails
Navigate to AI+ Studio from Sprinklr launchpad. Select the Security and Compliance card from AI+ Studio dashboard.
In the left navigation pane, select Guardrails. You will be redirected to Guardrails record manager, where you can access all the Guardrails configured in your environment.
You can Edit or Delete existing Guardrails from the record manager screen by selecting the vertical ellipsis (⋮) button next to the Guardrail entry.
Create a Guardrail in AI+ Studio
On the Guardrail record manager screen, click the ‘+ Guardrail’ button to create a new Guardrail.
You will be redirected to Select Generative AI Guardrail window, choose Harmful Content Guardrail from the dropdown list.
Click the Next button. You will be redirected to the configuration steps.
1. Configure Basic Details
On the Basic Details screen, provide the following information:
Name
Enter a unique and meaningful name for the Guardrail.
Description
Provide a short description that explains the purpose or scope of the Guardrail.
Example: Blocks content that includes hate speech, threats, or other forms of harmful expression.
Apply On
Select where the Guardrail should apply:
Input – Applies the Guardrail on user inputs before sending to the AI model.[NS1]
Output – Applies the Guardrail on the AI-generated responses.
You can select one or both options based on your enforcement requirement.
Message for Blocked Input
Enter the message that should be displayed when user input is blocked by the Guardrail.
Example: Your input contains content that is not allowed. Please revise and try again.
Message for Blocked Output
Enter the message that should be shown when the AI model output is blocked.
Example: The response was blocked due to a harmful content policy.
Share Guardrails With
Specify which users or user groups can access and use this Guardrail deployment. This setting enables collaboration and centralized governance.
Click the ‘Next’ at the bottom right corner to proceed to next step.
2. Configure Harmful Content Detection
On the Harmful Content Detection screen, you can define specific categories of harmful content that the Guardrail should monitor and block. This configuration helps ensure that AI models do not produce outputs that violate safety, legal, or ethical standards.
You can add one or more harmful content types to a Guardrail. Each content type includes:
Harmful Content
This field indicates the category of harmful content that the Guardrail will detect. Select a harmful content type from the dropdown.
Supported Harmful Content Types
You can select from the following predefined categories:
Non-Violent Crimes
Violent Crimes
Sex Crimes
Defamation
Child Exploitation
Specialized Advice
Privacy
Intellectual Property
Elections
Self-Harm
Indiscriminate Weapons
Hate
Sexual Content
Code Interpreter Abuse
Tolerance
The Tolerance Level determines how aggressively the Guardrail monitors and blocks content for a selected harmful content category. It allows you to adjust sensitivity based on your business or compliance needs.
You can choose from the following levels:
Low – Applies the strictest filtering. Even borderline or potentially harmful content is blocked. Recommended for high-risk categories such as child exploitation or violent crimes.
Medium – Offers a balanced approach. Clearly harmful content is blocked, while borderline content may be allowed. Suitable for general use cases.
High – Applies minimal filtering. Only the most severe or explicit content is blocked. Use this for categories where broader flexibility is acceptable.
Choose a tolerance based on the sensitivity of the use case. For example:
Violent Crimes: Medium
Child Exploitation: High
Best practice: Use Low tolerance for highly sensitive categories such as child exploitation or terrorism.
Note: Descriptions are auto generated for preconfigured Harmful Content Types. You can configure descriptions for custom harmful content types.
You can add multiple harmful content types to a single Guardrail to broaden coverage using the + Harmful Content button.
After configuring harmful content detection, click the ‘Next’ button to proceed to the next screen.
Create Custom Harmful Content
In addition to using predefined categories, you can create custom harmful content types in AI+ Studio to address organization-specific risks.
Follow these steps to create a custom harmful content type:
On the Harmful Content tab, start typing the name of your custom content type.
In the dropdown list, select the Create option.
The system will add your entry as a new harmful content type.
Enter a description to define the purpose and scope of the custom type.
Select a Tolerance Level—Low, Medium, or High—based on the strictness you require.
Tip: Use custom harmful content types to handle domain-specific language, brand sensitivities, or regulatory terms that are not covered by default options.
3. Configure Word Filters
The Word Filters screen allows you to block specific words or phrases from being included in user input or AI-generated output. This helps you enforce organization-specific guidelines or compliance requirements that are not covered by broader harmful content categories.
Filter Words
Use the Filter Words section to define custom terms that should be flagged or blocked by the Guardrail. These are manually curated terms that may be sensitive, brand-restricted, or otherwise inappropriate for your use case.
Note: Words entered here are treated as case-insensitive and matched as standalone terms unless specified otherwise.
Upload File
If you have a large list of filter words, you can bulk import them using a supported file.
File Upload Options
Drag and drop your file directly into the upload area, or
Select Upload File and browse your local system.
Supported File Formats
.XLS
.XLSX
.ODS
Tip: Ensure that the file contains a single column with one filter word or phrase per row. Avoid additional formatting or merged cells.
Note: You must add at least one Guardrail—either a Harmful Content type or a Word Filter—before saving. If no Guardrail is added, the system will display an error message, as shown in the image below.
Click the ‘Save’ button to save your Guardrail.
Use Guardrail in Deployment
You can apply Guardrails within the Prompt Node of a pipeline in AI+ Studio to restrict the generation of harmful content. Follow the steps below to add Guardrails to a deployment:
Go to the Deployments Record Manager for the relevant AI use case.
To update an existing deployment, click the More options (vertical ellipsis) next to the deployment name and select Edit Pipeline.
– Alternatively, you can create a new deployment and add Guardrails during configuration. Refer to the Configure Deployments for detailed steps.In the pipeline editor, select the Prompt Node you want to configure. The Prompt Configuration page opens.
In the Settings pane, choose the Guardrails you want to apply.
– You can select one or more Guardrails to apply within a single prompt.Click Save to apply your changes.
You can test the Prompt Node with Guardrails applied. If the system detects harmful content, an error message will appear in the Output pane, as shown in the image below.
Tip: Using multiple Guardrails ensures better coverage and stricter control over AI-generated content.
Guardrails in AI+ Studio provide a powerful and flexible way to enforce responsible AI behavior by detecting and blocking harmful or policy-violating content. By configuring harmful content types, word filters, and deployment-level controls, you can ensure your AI workflows remain safe, compliant, and aligned with organizational standards.