Guide to Confusion Matrices and Type Errors

Confusion Matrices and Type Errors

Dylan | Aug 05, 2019

Post Thumbnail

For anyone working with Data Science or Machine Learning, it’s important to be able to visualize our model’s performance, whether to better understand the model ourselves or to share and present our project to others. The confusion matrix is a common and intuitive tool for just that.

Furthermore, in order to understand the strengths and weaknesses of your model, it’s critical to be aware of the different types of errors our model can commit and even more importantly, which type of errors are the most costly given our unique machine learning problem.

The Confusion Matrix

Confusion matrices are powerful tools for visualizing the results of a classification model’s predictions on holdout data. The size of a confusion matrix depends on the number of possible classes with a row and a column for each class. A model intended to classify handwritten digits ranging between 0 and 9 would have a confusion matrix of 10 columns by 10 rows.

We’ll keep it simple for our first example and assume our model is a binary classifier, meaning its only classification options are limited to "Yes" or "No". Next, we’ll feed our model 100 instances of holdout data and visualize its performance with a confusion matrix.

Example Confusion Matrix

The sum of all cells in the matrix is 100, meaning every one of our 100 instances' predictions is represented. What can we deduce from the matrix?

Matrix’s Columns

The first column represents every instance the classifier predicted to be "Yes" while the second column represents the instances predicted to be "No".

Matrix’s Rows

The first row represents all of the instances that are actually "Yes" and the second row represents all of the instances that are actually "No".

Because we split our original data into two subsets, one for training and another for testing, the matrix is able to know the actual target variables of each instance.

Each cell in the matrix represents one of four metrics for evaluating our model’s predictions.

  1. True Positive - Represented by the top left entry, 38 instances were correctly labeled "Yes".

  2. False Negative - Represented by the top right entry, 7 instances were mislabelled "No".

  3. False Positive - Represented by the bottom left entry, 3 instances were mislabelled "Yes".

  4. True Negative - Represented by the bottom right entry, 42 instances were correctly labeled "No".

This representation of our model’s performance paints a clear picture of our model’s performance, highlighting both its strengths and weaknesses. There are two possible errors our example model could have made. Let’s explore them in greater depth.

Type I Errors: False Positives

In data science, False Positives are also commonly referred to as Type I Errors. These errors occur when our binary classification model incorrectly classifies an instance as "Yes".

Depending on the specific problem at hand, Type I Errors might become very costly. An example might be a model that classifies incoming emails as SPAM or HAM (not-spam). A Type I Error occurs every time our model mislabels a HAM email as SPAM.

A SPAM/HAM classifier with many Type I Errors threatens to flag important HAM emails as SPAM and hide them from the user. It would be much less costly if the model let a few SPAM emails pass into the inbox than hiding a few potentially important messages.

Type II Errors: False Negatives

On the flipside, False Negatives are commonly referred to as Type II Errors and occur when our binary classification model incorrectly classifies an instance as "No".

Just like with Type I Errors, depending on the problem at hand, Type II Errors can be much more costly than Type I Errors. A relevant example presented in a previous post about evaluation metrics involved a classification model that attempts to classify passengers as terrorists or non-terrorists.

Of course, having too many Type I Errors could be costly and incredibly inconvenient if many non-terrorist passengers are investigated as a result but the cost of mislabeling the sole terrorist in a group of 1,000 passengers could be tragic.

In this scenario, you would want to take the appropriate steps to ensure your classification model reliably identifies terrorists even if it sometimes mislabels law-abiding passengers.


Hopefully, the basic principles behind the confusion matrix and the different type errors have been clearly explained in this post. Confusion matrices can evolve to become much more complex as the number of possible classification classes increases but with a solid understanding of the basic 2x2 binary confusion matrix, it’s very easy to understand what’s represented by a 9x9 matrix.

As always, thanks for reading and if anything was unclear or you have any additional questions, I look forward to helping you further in the comments below!