So, we’ve learned that making the network deeper helps, and that most other things don’t really matter all that much in terms of accuracy, though they do definitely have an impact on efficiency. Let’s do one final experiment before designing what we hope will be the best possible architecture we can manage on one desktop.
This time in v6, we’ll try increasing the size of the filter. We’ve been using a size 3, meaning 3 pixels by 3 pixels, which is capable of discerning a number of features, but we wonder if perhaps increasing the size to 5 would help our accuracy. We also add another convolutional layer since that seems useful. Unfortunately we don’t make a lot of progress (v6 in dark blue):
Our loss is down to a 0.24, so we’re getting somewhere.
Alright, time to take everything we’ve learned and engineer the best possible architecture we can think of. Using 96 x 96 images pushes the capacity of our RAM, plus it is nearly as wide and tall as our smallest image. We want a nice big capacity for our network, so we model after VGGNet, hoping we can achieve higher accuracy yet. We use the following architecture:
network = input_data(shape=[None, 96, 96, 3], data_preprocessing=img_prep, data_augmentation=img_aug) conv_1 = conv_2d(network, 32, 3, activation='relu', name='conv_1') conv_2 = conv_2d(conv_1, 32, 3, activation='relu', name='conv_2') network = max_pool_2d(conv_2, 2) conv_3 = conv_2d(network, 64, 3, activation='relu', name='conv_3') conv_4 = conv_2d(conv_3, 64, 3, activation='relu', name='conv_4') network = max_pool_2d(conv_4, 2) conv_5 = conv_2d(network, 128, 3, activation='relu', name='conv_5') conv_6 = conv_2d(conv_5, 128, 3, activation='relu', name='conv_6') conv_7 = conv_2d(conv_6, 128, 3, activation='relu', name='conv_7') network = max_pool_2d(conv_7, 2) conv_8 = conv_2d(network, 256, 3, activation='relu', name='conv_8') conv_9 = conv_2d(conv_8, 256, 3, activation='relu', name='conv_9') conv_10 = conv_2d(conv_9, 256, 3, activation='relu', name='conv_10') network = max_pool_2d(conv_10, 2) network = fully_connected(network, 1024, activation='relu') network = dropout(network, 0.5) network = fully_connected(network, 2, activation='softmax')
Whew! Indeed, this really did push the limits, at least of my patience; even running just 50 epochs took two days. We ended up (green) a fair bit ahead of our v5 (blue and light purple), and certainly well ahead of v1 (dark purple):
Submitting to Kaggle, we get a loss of 0.17! Fantastic. This seems to correspond to an accuracy of around 96%. This is in my opinion good enough to stop and move on to a different project. I think we’ll talk about RNNs next. To reflect back, though, what else could we have done?
It seems that the current leaders in the competition (and many other people) are using transfer learning; they’re taking a pre-trained network, trained on the ImageNet data, and fine-tuning the last layer on the Kaggle data to achieve incredible losses, down below 0.05. We said at the beginning of this blog that we weren’t going to do that; since our goal is to do research and discover new algorithms, we won’t necessarily have the luxury of having other peoples’ work around to work off of.
Suppose we wanted to continue making progress ourselves, without a pre-trained model. What else could we do? Well, one thing we could try is simply getting more data. ImageNet seems to have a very large number of images; by searching simply cat or dog, we could hopefully acquire a reasonable data set which was larger than Kaggle’s. We could also try different architectures; we modeled after VGGNet, but we could instead try an Inception-like design, currently powering GoogLeNet. Although both of these things would help, I imagine that the improvement in loss, while not insignificant, wouldn’t be enough to push us down to compete with the transfer learning folks. Therefore, onwards and upwards to more projects!