Evaluation Metrics
We also need to be careful when selecting the evaluation metrics for our model. Suppose that we have two algorithms with accuracies of 98% and 96% respectively, for a dog/not dog classification problem. At first glance the algorithms look like they both have similar performance. Let us remember that classification accuracy is defined as the number of correct predictions made divided by the total number of predictions made. In other words the number of True Positive (TP) and True Negative (TN) prediction, divided by the total number of predictions. However, it might be the case that along with dog images we are also getting large number of background or similar looking objects falsely classified as dogs, commonly known as false positives (FP). Another undesirable behavior could be that many dog images are misclassified as negatives or False Negative (FN). Clearly, by definition the classification accuracy does not capture the notion of false positives or false negatives...