Measuring Performance via Golden Test Set

Updated 

Before you start, See How to Setup Golden Test Set?

Steps to view performance

  1. Make sure that the Perfomance Calculation for the golden test has been processed.

  2. Switch to different versions to view and compare golden test set reports. You will view the following columns for each version.

    1. Support - Number of expressions belonging to intent according to human validation.

    2. F1 Score - Overall accuracy for each intent.

    3. Precision - Fraction of AI model predictions for a particular intent that actually belong to that intent (according to human validation).

      Formula - Number of correct AI model predictions for intent / total number of expressions predicted by AI model in that intent.

    4. Recall - Fraction of expressions belonging to an intent (according to human validation) that the AI model is able to capture correctly.

      Formula - Number of correct AI model predictions for intent / total number of expressions belonging to intent according to human validation.

  3. Switch to different languages to view Golden Test Set report by clicking the Language option button.

  4. Click Review Predictions button in the top right corner to review the predictions of an intent model on individual expressions. Based on this, you can decide which pattern of expressions needs to be added/removed from intents in order to improve model performance.

    Additionally, you have the ability to sort the columns alphabetically and use filters to narrow down the information displayed.

    Note: The GTS interface now keeps intent scores static when filters are applied in the Review Predictions view.