‎Evaluate Tasks using Historical Cases

Evaluate Tasks using Historical Cases

Updated

26 days ago

The Evaluate Tasks feature enables analysis of AI agent processes using real customer interaction data. By leveraging historical cases, teams can identify performance trends and improvement opportunities and ensure workflows align with operational goals.

Steps to Evaluate Tasks

Create a new AI agent or access the manage page of an existing AI agent.
Click View beside Evaluate or select Evaluate Tasks in the left pane.
On the Evaluations window, click + Create New Evaluation in the top right corner.
In the Create New Evaluation window, enter the name of the evaluation.
Enable the Take Historical Cases toggle to assess workflow performance using
past cases.
Select the desired time range for historical cases to be included.
Click + Add Condition to apply multiple filters and refine evaluation criteria further.
Click Save and Run in the bottom right corner to initiate the evaluation.
Monitor evaluation progress and results in the evaluation homepage using List View for details or Feed View for insights.

Review Success Rates and Improvements

Confirm Completion: Completed evaluations appear on the Evaluation Homepage.
View Results: Click the Eye icon to open the report and review success rates, improvements, and insights.
Delete (Optional): Remove evaluations no longer needed to keep the workspace organised.
Compare Performance: The left panel shows the current configuration; the right panel shows the measured success rate.
Identify Issues: Low or 0% success rates indicate prompt or logic problems.
Apply Improvements: Use system‑suggested prompts to improve accuracy and tool usage; for example, raise success rate from 0% to 29.41%.
Review Insights: Check how results were calculated to validate recommendations.

Apply System Suggestions

Applying system suggestions leads to higher success rates, less manual tuning, and more reliable AI responses.

Run Evaluation: Complete the evaluation and simulation.
Apply Updates: Click Apply Suggestions to update workflows with system‑recommended prompts.
Check Results: The system re‑evaluates performance and shows success rates before and after.
Optimise Continuously: New interactions feed back into the system, keeping tasks accurate and improving over time.

Task Restructure: Merging and Splitting

The system can help you clean up and improve your tasks by either merging similar tasks or splitting tasks with mixed purposes. This makes your AI agent more accurate and easier to manage.

Run an Evaluation: The system checks how your tasks are working.
Merge: If two tasks do almost the same thing, the system suggests combining them.
Split: If one task is trying to do too many things, the system suggests breaking it into smaller tasks.
See Details: Suggestions show up in the Task Details and Suggestions page. Click the eye icon next to the evaluation name to view them.
Approve or Ignore: Nothing changes until you approve. You decide whether to merge, split, or keep tasks as they are.
Smart Recommendations: The system only suggests changes when they really help. If your tasks are already fine, it just gives tips to improve them instead of forcing changes.
·