Duration:
5
minutes
Summary:
This lesson will explore how to achieve accuracy of classification, how it’s calculated and used, and the concept of ground truth and its importance in evaluating model performance.
Module
2
:
Classification
← Back to Module
2

2

.

4

Accuracy in Classification

Transcript

Hi and welcome. In this video, we’ll explore how to achieve accuracy of classification and how it’s calculated and used. We’ll also look at the concept of ground truth and its importance in evaluating model performance.

Most engineering and product teams operate within an agile sprint cycle. Cytora follows suit, using this structure to develop a taxonomy and set target success criteria, typically aiming for a 70% accuracy rate before iteration. 

This approach also lays the groundwork for implementing downstream processes like field extraction and various business procedures.

To determine accuracy, the number of correctly classified files is divided by the total number of files evaluated and converted into a percentage.

Let’s take a hypothetical example for an easy illustration:, if there are 100 insurance claims and the model correctly classifies 70 of them, the accuracy rate is 70%.

This measure of accuracy acts as a crucial metric to gauge how well the model is performing. By showing the percentage of cases where the model correctly classified a file ultimately measures the model's ability to make appropriate predictions against the taxonomy. 

This serves as a fundamental performance indicator for classification models, guiding decision-making and informing further model refinement. 

Achieving high accuracy is crucial for ensuring the model's effectiveness in real-world applications, as it directly influences the reliability and trustworthiness of the classification outcomes.

Accuracy is also often considered in conjunction with other performance metrics, such as the number of times the classification is appropriate but not the desired outcome, or that the document cannot be classified at all due to an incomplete taxonomy.

For example, if a document is classified as ‘Director’s CV’ and the action created was to move anything labelled ‘Director’s CV’ to a folder called ‘archive’, this could lead to problems further on - you might want some Director’s CVs to go to a specific other folder called ‘Keep’.

By continuously monitoring accuracy and other relevant metrics, the model can be iterated to achieve the optimal classification model and enhance overall performance and usability.

Ground truth refers to the accurate and real-world data that’s used as a benchmark to validate and train models or algorithms. 

It's essentially the "gold standard" or reference point that you compare your predictions or outputs against to test their accuracy. In image recognition, for example, ground truth might be the labelled images indicating the objects they contain, measured against a model's predictions.

Imagine teaching a computer to recognise different types of shapes in pictures. You would input 100 pictures, each labelled with the shape name. The labels are the "ground truth" because they’re the accurate information that tells you what each picture contains. You can then use this labelled data to train your computer model to recognise shapes.

The model is then tested with new pictures to check its predictions against the ground truth and see how well it performs. So, ground truth is basically like having the answer sheet - it tells you what the correct answers are so you can see if they match up.

In a binary classification problem, for example, where the task is to predict whether an email is spam or not, the ground truth would consist of the labels manually assigned to each email in the training dataset - basically to define whether the email is Spam or not Spam.

However, it’s important to note that a ground truth should only be created by manually categorising each document based on its specific content. 

Because of this, be aware that ground truth outputs can sometimes, further downstream, become an inaccurate view, as they may have gone through additional human intervention, as opposed to remaining an accurate representation of the presented data.

In the next video we will look at the most effective way to define a Taxonomy.

Previous lesson
Next lesson
Previous lesson
No previous lesson
Next lesson
No next lesson
By using this website you agree to our cookie policy
Okay