Confusion matrix in cyber crime cases.

Hema R
3 min readJun 6, 2021

Confusion matrix plays a major role in classification model because it helps us to evaluate how good is our classification model is. Lets us start by understanding what is confusion matrix is.

A confusion matrix is N*N matrix where N is the number of target classes. Visualizing confusion matrix can be done by using scikitplot module.

Confusion matrix for binary classification.
Confusion matrix for binary classification.

Here in binary classification, we have 4 values TP, FP, FN, TN. What does it tells us?

True Positive (TP):

The predicted value matches the actual value and both are positive values.

True Negative (TN):

The predicted value matches the actual value and both are negative values.

False Positive (FP): (Type-1 error)

The predicted value doesn’t match the actual value.

The predicted value is positive and while the actual value is negative.

False Negative (FN): (Type-2 error)

The predicted value doesn’t match the actual value.

The predicted value is negative and the actual value is positive.

Consider this confusion matrix for identity theft.

Out of 165 cases :

True Positive (TP): 100 positive data points were correctly classified by the model.

True Negative (TN): 50 negative points were predicted correctly by the model.

False Positive (FP): 10 negative data were incorrectly predicted as positive by model.

False negative (FN): 5 positive datas were incorrectly predicted as negative by model.

The problem is with False Positive (FP) and False Negative (FN) and with FN as the most dangerous, because we are not aware of the issue which is going to happen. It will create a environment where we believe there is no issues.

Accuracy:

Accuracy helps us to define how often our classifier is been right.

It is the ratio of sum of all true values to total values.

Lets find accuracy for the above model.

Accuracy = (100+50)/(100+50+10+5) = 150/165 =0.9090

Accuracy = 90.9%

Precision:

Precision tells about how well the model is able to classify the positive values correctly.

It is ratio of True Positives to sum of True and False Positives.

The precision value lies between 0 and 1. For a good classifier, the precision value should be 1. As far as the value of False Positive increases, the precision value will be low.

Precision = 100/(100+10) = 100/110 = 0.9090

Precision = 0.909

Recall:

Recall is the number of relevant documents retrieved by a search divided by the total number of existing relevant documents.

It is the ratio of True Positive to sum of True Positive and False Negative.

Recall =100/(100+5) = 100/105 = 0.9523.

Recall = 0.9523

F1 score:

F1-score is the Harmonic mean of the Precision and Recall.

F1 score is considered perfect when its value is 1 and considered to be failure when it is 0.

F1 Score = 2* (0.909*0.9523)/(0.909+0.9523) = 1.7313/1.8613 = 0.9301.

F1 score = 0.9301.

Thanks for reading!!!

--

--