Word Filters Guardrail
Updated
The Word Filters Guardrail allows you to block specific words or phrases from appearing in user interactions. This configuration is useful for restricting profanity, sensitive terms, or any language that violates organizational policies.
When a filtered word is detected, the system can:
Block interactions in input prompts and output responses based on your configuration.
Trigger a fallback response that you define.
You can add single words, phrases, or patterns to the filter list. This configuration ensures that conversations remain compliant, respectful, and free from prohibited language.
Configure Word Filters Guardrail in AI+ Studio
On the Guardrail record manager screen, click the ‘+ Guardrail’ button to create a new Guardrail.
You will be redirected to the Select Generative AI Guardrail window, and choose Word Filters Guardrail from the dropdown list.
Click the Next button. You will be redirected to the configuration steps.
1. Basic Details
On the Basic Details screen, provide the following information:
Name
Enter a unique and meaningful name for the Guardrail.
Description
Provide a short description that explains the purpose or scope of the Guardrail.
Example: Blocks content that includes hate speech, threats, or other forms of harmful expression.
Apply On
Select where the Guardrail should apply:
Input – Applies the Guardrail on user inputs before sending to the AI model.
Output – Applies the Guardrail on the AI-generated responses.
You can select one or both options based on your enforcement requirement.
Message for Blocked Input
Enter the message that should be displayed when user input is blocked by the Guardrail.
Example: Your input contains content that is not allowed. Please revise and try again.
Message for Blocked Output
Enter the message that should be shown when the AI model output is blocked.
Example: The response was blocked due to a harmful content policy.
Share Guardrails With
Specify which users or user groups can access and use this Guardrail deployment. This setting enables collaboration and centralized governance.
Click the ‘Next’ at the bottom right corner to proceed to next step.
2. Word Filters
The Word Filters screen allows you to block specific words or phrases from being included in user input or AI-generated output. This helps you enforce organization-specific guidelines or compliance requirements that are not covered by broader harmful content categories.
Filter Words
Use the Filter Words section to define custom terms that should be flagged or blocked by the Guardrail. These are manually curated terms that may be sensitive, brand-restricted, or otherwise inappropriate for your use case.
Note: Words entered here are treated as case-insensitive and matched as standalone terms unless specified otherwise.
Upload File
If you have a large list of filter words, you can bulk import them using a supported file.
File Upload Options
Drag and drop your file directly into the upload area, or
Select Upload File and browse your local system.
Supported File Formats
.XLS
.XLSX
.ODS
Tip: Ensure that the file contains a single column with one filter word or phrase per row. Avoid additional formatting or merged cells.