So, we’re back to our original architecture. What else could we try to make progress? Perhaps the capacity of our network is too low. We could increase it by making it wider, like we tried last time, or we could make the network deeper. Let’s try that and see how it goes. We prepare a v4 of our network, the key change being depth:
network = input_data(shape=[None, 64, 64, 3], data_preprocessing=img_prep, data_augmentation=img_aug) conv_1 = conv_2d(network, 32, 3, activation='relu', name='conv_1') network = max_pool_2d(conv_1, 2) conv_2 = conv_2d(network, 64, 3, activation='relu', name='conv_2') conv_3 = conv_2d(conv_2, 64, 3, activation='relu', name='conv_3') network = max_pool_2d(conv_3, 2) conv_4 = conv_2d(network, 64, 3, activation='relu', name='conv_4') conv_5 = conv_2d(conv_4, 64, 3, activation='relu', name='conv_5') network = max_pool_2d(conv_5, 2) network = fully_connected(network, 512, activation='relu') network = dropout(network, 0.5) network = fully_connected(network, 2, activation='softmax')
We continue to run for only 50 epochs. Let’s see how this goes, compared to v1 still in purple:
(You’ll have to forgive the tracebacks – we later began running v5 by accident with the same tfboard name.) Clearly, we’re on the right track – the overfitting isn’t as bad, and the loss is definitely better in the validation set than before. Submitting to Kaggle yields 0.26. Progress!
Onto v5 (code to follow); this time we’ll add L2 regularization back in (as we still have slight overfitting problems), we’ll increase the number of filters in our last two convolutional layers from 64 to 128, and switch to a sigmoid activation function at the final layer instead of softmax, but is otherwise the same as above. This yields some progress (light blue):
This is looking great – the loss is improving and there’s no evidence of overfitting. Post-processing in Mathematica, though, we only get 69/100 right! Maybe this is a fluke but it sure is worrisome. Perhaps the network simply hasn’t converged – the best case scenario might be that the loss in the validation set stayed constant for a while, but it looks like it was just settling down when we stopped. Therefore, we prepare a v5.2 with the same network structure, but loading the network parameters from v2, and keep running. Indeed, it does converge and stay flat for a while:
This is great! However, we still only get 77/100 in post-processing. This could just be a 1.5 sigma downward fluctuation/bad luck (which we could verify by simply classifying more examples, but we’re lazy), but we think we can do better, so we don’t submit to Kaggle.