Now that we know what we’re dealing with, we’ll undertake some simple steps to pre-process the images and pass them through a CNN. We’ll use Tensorboard to monitor the progress of the algorithm.
A good place to start would seem to be borrowing some code. We want to build a binary classifier, an algorithm that can classify an image into one of two categories. We want it to be exclusive – the images have either a cat or a dog, but not both. To that end, straightforward googling yields the following useful piece of code, which we’ve re-uploaded to GitHub rather than filling this entire blog post.
The code takes care of pre-processing for us – it loads all of the files, splits them into a train and test set for diagnostic purposes, seemingly performs feature normalization and augments the data sets by acting with a subset of transformations – left-right reflections and rotations up to 25 degrees. This seems entirely reasonable to me, so I’m leaving it alone.
The one thing that it is doing that we may want to modify later is downsizing all the images to 64 x 64. This seems reasonable to get going, but I worry that it might not be enough information for the network.
Next up, it’s building an architecture for us. At first, I didn’t know much about architectures, so I left it as is, but this architecture is the bulk of what we plan to change in the future. We ran the code in hours, producing a model file. Tensorboard kept track of the training and produced some diagnostics for us to look at:
We’re seemingly doing quite well, by my expectations. The accuracy appears to be in excess of 90%, which I found quite surprising. However, training set accuracy or loss don’t tell the whole story – we must examine the validation set accuracy and loss. We seem to cap out at 90% accuracy, but more importantly, the validation loss seems to decrease before increasing again. This appears to indicate overfitting. In principle we combat this to some extent with dropout, but we’ll be focusing on combating overfitting over the next few posts.
Now, we’d like to see how we do on our real test set – the one produced by Kaggle, which we don’t have the answers for. We have to load the model and produce a collection of predictions. We coded this by trial and error. We don’t need to load and pre-process the images, but in order to load the model, TFLearn appears to want me to prepare the network first. Therefore, for our test set run, we’ll use this code, which will henceforth serve as a representative piece of code for all future runs, but with the different architecture substituted. In particular, the only new work is done by the following:
model.load('./model_cat_dog_6_final.tflearn') outfile = open('myIntAnswers.csv', 'w') myCount = 0 for f in test_files: myCount += 1 if(myCount % 500 == 0): print("On file " + str(myCount) + "...") try: img = io.imread(f) new_img = imresize(img, (size_image, size_image, 3)) allX = np.array(new_img) ans = model.predict(allX) outfile.write(str(f).strip('./testjpg')+','+str(ans)+'\n') except: continue outfile.close()
Note in particular that we’re electing to stream in the files rather than loading them all at once. We do this primarily because we only plan on visiting each file once, instead of training with each file many times as we do in the training phase. Perhaps the code would run faster if we didn’t do this, but it only takes a minute or two to carry out the training, so we don’t bother trying to make it faster.
From here, we add to the first line “id,label” as requested by Kaggle and upload. We score a loss of… 0.61?! Apparently that’s worse than random guessing! What could have gone wrong?