Dog Breed Classifier — Udacity

Felipe Garcia
7 min readApr 17, 2021

Project Overview

This project aims to classify dog breeds and humans similarity to dog breeds. To achieve this goal, we will build a Convolutional Neural Network (CNN) to classify a given image, testing both a CNN trained from scratch and pre-trained architectures.

This project is proposed by Udacity, where it was provided a Jupyter notebook to guide the problem solving. The notebook’s structure is as follows:

  • Step 0: Import Datasets
  • Step 1: Detect Humans
  • Step 2: Detect Dogs
  • Step 3: Create a CNN to Classify Dog Breeds (from Scratch)
  • Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)
  • Step 6: Write your Algorithm
  • Step 7: Test Your Algorithm

Problem Statement

In order to build a model that classifies a dog’s breed or a human’s resemblance to a dog breed, we, first, need to identify if the image contains a dog or a human.

To identify this content, we can use a pre-trained model to classify the image, verifying if it is a dog. If the image doesn’t have a dog, we can look for a human face using OpenCV’s face detector (Haar Feature-Based Cascade Classifier).

If the image has a dog, we will print it’s breed, if it contains a human, we will return the dog breed that he/her look like. Finally, if the image doesn’t have neither a dog or a human, we will return a error message.

Metrics

To evaluate our solution’s performance, we will use the accuracy, defined as follows:

Where TP stands for True Positive, TN is True Negative and FP is False Positive. This metric is the number of correct predictions divided by the total number of predictions, and, for being simple and easy to interpret, we will use the accuracy to evaluate our solution. Also, as our dataset does not have a high unbalanced class, the accuracy is a suitable metric.

Step 0: Import Datasets

In this project, we will use two datasets: the dog dataset and the human dataset. Therefore, our first step is to load those datasets:

Import Dog Dataset

There are 133 total dog categories.
There are 8351 total dog images.

There are 6680 training dog images.
There are 835 validation dog images.
There are 836 test dog images.

Also, we can check the dog breed distribution on our training data:

Import Human Dataset

There are 13233 total human images.

Sample Images

Having a quick look into our datasets, we can plot a few images to see the data that we are working with:

Step 1: Detect Humans

To detect human faces we will use Haar feature-based cascade classifiers, which compares the image with a series of pre-computed features. To use this classifier, we need to download a face detector from OpenCV’s github and load it:

Step 2: Detect Dogs

To detect dogs we use the pre-trained model ResNet-50 using the resulting weights from training the ResNet-50 on the ImageNet dataset. Where the ResNet is deep learning model which uses CNN and residuals connections.

Additionally, we need to pre-process the data, resizing each image to 224x224.

In the above code, we prepare the data and define a dog detector function, that will return true if it finds a dog in the image

Step 3: Create a CNN to Classify Dog Breeds (from Scratch)

Using the human and dog detectors, we can identify if there is a dog or a human in the image,then, create a model to classify the dog breeds.

To create this model, we will build a CNN and train it from scratch, with the goal of achieving more than 1% of accuracy on the training set. Also, we need to normalize the images, dividing each pixel by 255.

With the model built, we will also need to compile, train and evaluate the model.

With the above model and number of epochs, we were able to achieve a accuracy of 5.38 %, and the training process were slowly converging, therefore, the model would eventually improves it’s accuracy with more training time.

However, the training time for this model with a large number of epochs would be really high, even when using a GPU.

Step 4: Use a CNN to Classify Dog Breeds (using Transfer Learning)

In order to improve our accuracy with a smaller training time, we can use a model trained in another dataset and use it in our problem using transfer learning.

Our model will use the pre-trained VGG16, adding two more layers and training on our datasets for 20 epochs.

The pre-trained VGG16 were able to achieve 44.26% of accuracy with only 20 epochs, resulting on a huge improvement comparing to our model built and trained from scratch.

Step 5: Create a CNN to Classify Dog Breeds (using Transfer Learning)

We can also explore another pre-trained model, such as the Inception, adding four more layers and training for 20 epochs with our data.

With the Inception model, we were able to achieve a accuracy of 77.27%, a great improvement!

Step 6: Write your Algorithm

Using the face_detector and dog_detector functions, together with our InceptionV3_predict_breed, we can built our final classifier.

The classifier will, first, use dog_detector to look for a dog in the image, if one is found, it will return it’s breed using InceptionV3_predict_breed. Second, if a dog was not found, it will check for human faces using the face_detector, if a face is found, it will return the dog’s breed that the image looks like using InceptionV3_predict_breed. Finally, when neither dog or human is found, the algorithm will print a error message.

Step 7: Test Your Algorithm

To test our algorithm, we will apply our solution to six test images:

The results for the rest images were mostly right, with the exception of a human face being missed, showing that it could be good for the model to try different face detectors.

Results

The results are summarized in the below table, where we can see that the network using Inception as the base pre-trained model outperformed the other models, with a accuracy of 77.27%.

Additionally, the VGG16 had the lowest training time, followed by the InceptionV3 and the CNN.

Refinement

With the tests, we can see that our algorithm made some mistakes, to improve our classifier, we could increase the number of training epochs, increase the dataset using data augmentation, change the network parameters to find the optimum value using Grid Search or even test others pre-trained models.

Conclusion

In this work, we built a model to classify a dog’s breed or a human resemblance to a dog breed, creating a face detector (using OpenCV) and a dog detector (using ResNet-50), to check the presence of a dog or a human in the image.

Besides that, we created different dog breed classifiers, using a CNN trained from scratch and two pre-trained models, VGG16 and IncpetionV3. The Incpetion model achieved the best accuracy and was used on our final algorithm.

The resulting model achieved a accuracy of 77.27% in our test set, a good result but with a lot of room for improvements.

For more details, check the project on the GitHub.

--

--