Transfer learning is a way to reuse already trained models to increase the performance of a new model being trained. Today, we will explore this concept.
BUT FIRST, it’s time for a brief reflexion about this challenge.
When I decided to start the #100DaysOfTensorflow challenge, I had two main goals: to discover features about this ecosystem that I still didn’t know and to not “forget” what I had already learned.
However, I don’t consider the way as the #100DaysOf**Something** optimal. Sometimes, we keep doing things that don’t “connect” just for the sake of doing them.
While I believe this may help us to remember automatic things – such as “Dense layers path is tk.keras.layers.Dense” – I think that analytical thinking requires more work and deeper analysis of problems.
Also, learning Tensorflow is great, but Machine Learning and even data analysis is not about all this tool. There are simpler and faster solutions to solve data related problems.
So, I will be changing the format of this challenge in the following way:
- Instead of #100DaysOfTensorflow, let’s call this challenge #100DaysOfData
- I will still continue to code everyday, but I may not publish complete code on a daily bases. In my opinion, rush is the greatest enemy of good analysis. So, instead, I’d rather commenting what I have done for the day.
That being said… During the next days, I will explore
Tensorflow data for at least 1 hour per day and post the notebooks, data and models, when they are available, to this repository.
Today’s notebook is available here.
# do imports import os import numpy as np import matplotlib.pyplot as plt import tensorflow as tf import tensorflow_datasets as tfds
Get the examples from the “Cats vs. dogs” dataset.
- Train: 80%
- Validation: 10%
- Test: 10%
The images contain images with different shapes and 3 channels.
(raw_train, raw_validation, raw_test), metadata = tfds.load( 'cats_vs_dogs', split=['train[:80%]', 'train[80%:90%]', 'train[90%:]'], with_info=True, as_supervised=True, )
First thing we will do is to resize all images, so they have a 100 x 100 size. Tensorflow official example uses 160 x 160, but I would like to experiment with smaller values to check the impact of this change.
IMG_SIZE = 100 # All images will be resized to 160x160 def format_example(image, label): image = tf.cast(image, tf.float32) image = (image/127.5) - 1 image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE)) return image, label # apply to dataset train = raw_train.map(format_example) validation = raw_validation.map(format_example) test = raw_test.map(format_example) # shuffle the dataset and batch the data BATCH_SIZE = 32 SHUFFLE_BUFFER_SIZE = 1000 train_batches = train.shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE) validation_batches = validation.batch(BATCH_SIZE) test_batches = test.batch(BATCH_SIZE)
Create the base model using pre-trained convnets
The base model used here comes from Ternsorflow official examples and uses the MobileNet V2 model developed at Google.
According to them, “this is pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M images and 1000 classes”.
IMG_SHAPE = (IMG_SIZE, IMG_SIZE, 3) # Create the base model from the pre-trained model MobileNet V2 base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
We have to “freeze” the convolutional base created before to use it as a feature extractor. Then, we add a classifier on top of it and train the top-level classifier. To freeze the model, we set the trainable flag to “False”.
base_model.trainable = False # check model base_model.summary()
To generate predictions, we use GlobalAveragePooling2D layer and a Dense layer to convert features into a single prediction per image.
# add layer to create features global_average_layer = tf.keras.layers.GlobalAveragePooling2D() # add a prediction layer prediction_layer = tf.keras.layers.Dense(1) # create model model = tf.keras.Sequential([ base_model, global_average_layer, prediction_layer ]) # compile base_learning_rate = 0.0001 model.compile(optimizer=tf.keras.optimizers.RMSprop(lr=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy']) # see summary model.summary() # train model initial_epochs = 10 history = model.fit(train_batches, epochs=initial_epochs, validation_data=validation_batches)
Check the loss and the accuracy.
acc = history.history['accuracy'] val_acc = history.history['val_accuracy'] loss = history.history['loss'] val_loss = history.history['val_loss'] plt.figure(figsize=(8, 8)) plt.subplot(2, 1, 1) plt.plot(acc, label='Training Accuracy') plt.plot(val_acc, label='Validation Accuracy') plt.legend(loc='lower right') plt.ylabel('Accuracy') plt.ylim([min(plt.ylim()),1]) plt.title('Training and Validation Accuracy') plt.subplot(2, 1, 2) plt.plot(loss, label='Training Loss') plt.plot(val_loss, label='Validation Loss') plt.legend(loc='upper right') plt.ylabel('Cross Entropy') plt.ylim([0,1.0]) plt.title('Training and Validation Loss') plt.xlabel('epoch') plt.show()