In the next two posts I'll cover how to use the Keras python library and the R Shiny package to create a web app capable of producing a life insurance quote from an image of a face. To do this, we'll need to have a model that can estimate a persons age and gender from our input image - for this we will use a convolutional neural network (CNN).

This first post will cover creating the age/gender model.

You will need:

  • Keras (TensorFlow backend) - I am using version 2.1.6
  • The R 'shiny' package
  • The Keras R package
  • Preferably access to a GPU - although as you will see, this is not essential

Before starting, it's worth outlining (broadly) all the steps needed in order to get to the end product. These are:

  1. Gather data (if necessary - I'll expand on this shortly)
  2. Build or obtain our CNN
  3. Save the model to disk
  4. Load the model into an R session
  5. Create a 'shiny' app that loads this model
  6. Import an image into the app and predict age/gender
  7. Perform calculations using standard actuarial tables, the prediction output and other inputs defined by the user

Now we'll look at creating the model the app will use. Note - the theory/intuition behind convolutional neural networks is not covered here.

Creating the Model

Technically we require two models - one to predict age and one to predict gender. And in order to build a model from scratch we require data. That data would be lots of pictures of faces, annotated with the age and gender of each subject. However, we don't necessarily need to create our own model entirely from scratch for the purposes of this app (and so we may not actually need any data). It's certainly possible and I will cover how to go about training your own model but in the interests of time it may be preferable to use a pre-trained model that has been released for others to use. A quick google search should return a few different pre-trained models that are capable of predicting age and gender. I am going to use this excellent example on github, which has the advantage of having been built using keras.

Option 1 - use a pre-trained age/gender classifier

There are instructions contained in the github repository linked above but to make things easier I am going to cover only the steps needed to get a working age/gender model in a format that will be easy to load into our R session later. In the zip file available for download there is a file called wide_resnet.py. This contains the code for creating the model architecture. Place the wide_resnet.py file (module) into your current working directory and execute the following, which will create the model but won't compile or train it.

In [ ]:
from wide_resnet import WideResNet

model = WideResNet(64)()

To see a summary of the current model architecture, use model.summary. You'll see a printout describing the model and it's layers. It is a CNN - if you don't understand what all the different layers etc mean, don't worry for now. Next, we need to load in the weights that were computed during training, otherwise we can't use this CNN to make predictions. Within the github repository, navigate to 'releases' and download the weights file (a .hdf5 file). Place the weights file in your current working directory (or any other directory you like - just be sure to point towards it when you try and load it) and run the following:

In [2]:
model.load_weights('weights.18-4.06.hdf5')

Finally, save the model using model.save(). From experience, it's important that the Keras version used to create the model is the same as the version you'll use in R to try and load the model. If not, you may see errors when you try and load it into R with the R Keras package.

In [3]:
model.save('wide_res_py.h5')

Something else worth mentioning is that you don't need to have compiled the model in order to generate predictions. So, if you see any warning messages about not having compiled the model, don't worry about it. We can generate a prediction on a new image to check the model produces reasonable results.

In [4]:
import numpy as np
from keras.preprocessing import image

# Load in the image - with target size the model expects
test_image = image.load_img('/ML Files/Age_Gender/kim.jpg', target_size = (64, 64))
test_image = image.img_to_array(test_image)
test_image = np.expand_dims(test_image, axis = 0)

# Generate predictions
result = model.predict(test_image)
predicted_genders = result[0]
ages = np.arange(0, 101).reshape(101, 1)
predicted_ages = result[1].dot(ages).flatten()

# Return the predictions
print("F" if predicted_genders[0][0]>0.5 else "M")
print(predicted_ages)
F
[30.30180318]

For info, here is the image I loaded (albeit using a different target size - so it displays more clearly). The model predicts a female, age ~ 30, which seems reasonable.

In [6]:
import matplotlib.pyplot as plt

img = image.load_img('/ML Files/Age_Gender/kim.jpg')
plt.imshow(img)
plt.show()

Option 2 - create your own model

We could create an entirely new model from scratch or alternatively we could still utilise a pre-trained model, just in a different way. The pre-trained model doesn't even have be trained on the same dataset or used for the same classification problem (although it helps if it has seen similar example images!). Rather than using any pre-trained model out of the box we could use feature extraction and/or fine-tuning techniques to make it applicable to our particular dataset and problem. If you're not familiar with the the concepts of transfer learning and fine-tuning a CNN try reading this post to get a better understanding. I will look firstly at how we could build an entire model from scratch and then move on to using feature extraction, which involves running our data through a pre-trained network to extract interesting features that were learned in that network and then feeding these into a new classifier. Finally, I'll look at fine-tuning using the VGG-16 architecture, trained on the ImageNet dataset. These methods can feasibly be run on a CPU with a small dataset (i.e we don't need access to expensive GPU's in order to build these models) but it's worth noting that utilising a GPU is likely to yield far better performance and will allow you to try more computationally heavy techniques and work with more data. Also, I'm only going to look at creating a model to predict gender for now, but similar steps could be taken in order to build an age model.

First, we need to create the dataset used for training and testing. I've used quite a small dataset - collected from the UI Faces API and located here (I did not collect this data myself, I came across it on another blog post!). This includes 800 training images (400 male, 400 female) and 240 test images (also 50/50 Male/Female). The data is organised into folders for training and testing, and then males and females are in separate directories within the train and test folders. If you have access to a GPU you may want to use a larger dataset, like the IMDB-Wiki or Adience datasets.

I will largely follow the steps outlined in the Keras blog post on how to create powerful CNN models using little training data. Rather than regurgitate a lot of the detail contained within that post, I suggest you read it to get a better understanding of the theory behind some of the steps we'll take.

To begin, let's look at a selection of training images.

In [7]:
from os import listdir
from PIL import Image as PImage
import matplotlib.pyplot as plt
import numpy as np

# Create helper function for loading images
# Will return an array of images from directory
def loadImages(path):
    imagesList = listdir(path)
    loadedImages = []
    for image in imagesList:
        if path+image == path + '.DS_Store':
            continue
        img = PImage.open(path + image)
        loadedImages.append(img)

    return loadedImages

path = "/ML Files/Age_Gender/faces/Examples/"

# Create the array
imgs = loadImages(path)

# Now view a selection...
cols, rows = 4, 3
img_num = cols * rows

# Function for displaying multiple images
def show_imgs(img_paths):
    img_ids = np.random.choice(len(img_paths), img_num, replace=False)

    for i, img_id in enumerate(img_ids):
        plt.subplot(rows, cols, i + 1)
        img = img_paths[img_id]
        plt.imshow(img_paths[img_id])
        plt.axis('off')
    
    plt.show()

show_imgs(imgs)

Now, train a simple model from scratch using Keras, to get a baseline accuracy score. This will have two convolution layers, a low number of filters in each layer and will use data augmentation and dropout to limit overfitting. Before setting up the model, we need to define how image data will be loaded in and augmented.

In [ ]:
from keras.preprocessing.image import ImageDataGenerator

batch_size = 32

# Define the data augmentation applied to training data
train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=40,
      width_shift_range=0.2,
      height_shift_range=0.2,
      shear_range=0.2,
      zoom_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
# Define data aug for test - this will only be rescaled
test_datagen = ImageDataGenerator(rescale=1./255)

# Define generators to read in batches of images (from directory) and apply data augmentation
train_generator = train_datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/train',  
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/validation',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')

Now set up the model and fit to the training data.

In [ ]:
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.models import Model

base = Sequential()
base.add(Conv2D(filters = 64, kernel_size=(3,3), input_shape = (150, 150, 3)))
base.add(Activation('relu'))
base.add(MaxPooling2D(pool_size=(2, 2)))

base.add(Conv2D(32, (3, 3)))
base.add(Activation('relu'))
base.add(MaxPooling2D(pool_size=(2, 2)))

base.add(Flatten())  # Convert feature maps to vector
base.add(Dense(128))
base.add(Activation('relu'))
base.add(Dropout(0.5))
base.add(Dense(1))
base.add(Activation('sigmoid'))

base.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

base.fit_generator(
        train_generator,
        steps_per_epoch=800 // batch_size,
        epochs=25,
        validation_data=validation_generator,
        validation_steps=240 // batch_size)

This gets to a validation accuracy of ~ 72% after 25 epochs. This is not great, plus the results are quite volatile.

Feature Extraction and Fine Tuning

Now we can try and use the features learned in another network (VGG-16, trained on ImageNet) to improve this result. The faces dataset can be passed through the convolutional part of the VGG-16 model. The output from this will be arrays of numbers representing the learned features, which can then be fed into a new classification network. This part should be pretty quick to run - 25 epochs took me about one minute to run.

In [10]:
# import the vgg16 model
from keras.applications import VGG16

conv_base = VGG16(weights='imagenet',
                  include_top=False, # setting this as false imports only the convolutional part
                  input_shape=(150, 150, 3)) 

# Summarise the model architecture
conv_base.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 150, 150, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 150, 150, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 150, 150, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 75, 75, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 75, 75, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 75, 75, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 37, 37, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 37, 37, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 37, 37, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 18, 18, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 18, 18, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 18, 18, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 9, 9, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 9, 9, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 4, 4, 512)         0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
In [ ]:
import numpy as np

# Not using data augmentation, other than to rescale both train
# and test data, so only one ImageDataGenerator required
datagen = ImageDataGenerator(rescale=1./255)

train_generator = datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/train',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode=None,# Won't generate labels
        shuffle=False)  # important! Don't want to return shuffled batches
                        # i.e leave them in order

# Create the features using predict generator
train_features = conv_base.predict_generator(train_generator, 
                                             steps = 800/batch_size)

# Since the features are in order, easy to just create an array or 0's & 1's
# in order to represent the labels
train_labels = np.array([0] * 400 + [1] * 400)
In [ ]:
# Repeat for validation data
validation_generator = datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/validation',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode=None,
        shuffle=False)

validation_features = conv_base.predict_generator(validation_generator, 
                                                  steps = 240/batch_size)

validation_labels = np.array([0] * 120 + [1] * 120)
In [ ]:
# Train a model using the extracted features as inputs
feat = Sequential()
feat.add(Flatten(input_shape=train_features.shape[1:]))
feat.add(Dense(256, activation='relu'))
feat.add(Dropout(0.5))
feat.add(Dense(1, activation='sigmoid'))

feat.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

history = feat.fit(train_features, train_labels,
          epochs=25,
          batch_size=batch_size,
          validation_data=(validation_features, validation_labels))

# save this model & weights
feat.save('/ML Files/Age_Gender/feat.h5')
# save the weights - will use them later
feat.save_weights('/ML Files/Age_Gender/feature_extraction_weights.h5')

We can plot the scoring history (accuracy) of the model - both training and validation - using matplotlib.

In [14]:
acc = history.history['acc']
val_acc = history.history['val_acc']

epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.show()

It looks like not a bad result, but there is some evidence of overfitting and the validation accuracy seems quite volatile. We can also see how this model performs on images completely unseen during training (nor were they used for validating hyper-parameters). To be clear, this isn't a holdout sample that validates our final model, as it's only a tiny sample of images I have randomly downloaded from google, but it gives a rough idea of whether the model is totally off base or not.

In [15]:
from keras.preprocessing import image

# load in some unseen images, predict gender and display results
holdout = loadImages('/ML Files/Age_Gender/faces/holdout/')
cols, rows = 4,3
img_num = cols * rows

for i in range(img_num):
    img_id = i
    plt.subplot(rows, cols, i + 1)
    img = holdout[img_id]
    img = img.resize((150,150)) # resize image
    test_image = image.img_to_array(img) # create array from image
    test_image = np.expand_dims(test_image, axis = 0)
    test_image = test_image.astype('float32')/255 # rescale
    tf = conv_base.predict(test_image) # extract vgg features
    result = feat.predict(tf) # predict gender from vgg features
    plt.tight_layout(pad=0.0, w_pad=0.0, h_pad=0.5)
    plt.imshow(img) # show the image
    plt.title("M" if result[0] > 0.5 else "F")
    plt.axis('off')

Seems to be performing pretty well, but one of the images of a female is predicted as being a male!

Next, try and improve the performance by fine tuning VGG-16 rather than just extracting features. The Keras documentation explains the process involves taking a trained network and re-training it on a new dataset using very small weight updates. The steps we need to follow are:

  1. Add a custom network on top of an already-trained base network.
  2. Freeze the base network.
  3. Train the part we added.
  4. Unfreeze some layers in the base network.
  5. Jointly train both these layers and the added part.

Steps 1-3 have already been completed during feature extraction.

In [16]:
# VGG-16 model aready imported earlier

# Create new classifier architecture - single hidden layer with dropout
model = Sequential()
model.add(Flatten(input_shape=conv_base.output_shape[1:]))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

# Since we need to start with fully trained model, load
# the weights we trained in feature extraction step
model.load_weights('/ML Files/Age_Gender/feature_extraction_weights.h5')

# Add the model on top of the convolutional base
new_model = Model(inputs=conv_base.input, outputs=model(conv_base.output))

# Unfreeze some layers of the convolutional base
new_model.trainable = True
set_trainable = False
for layer in new_model.layers:
    if layer.name == 'block5_conv1':
        set_trainable = True
    if set_trainable:
        layer.trainable = True
    else:
        layer.trainable = False

Unfreezing just a few layers of the convolutional base, rather than fine-tuning all layers, should reduce the chance of overfitting. Next, compile the model (using a low learning rate) and set up data generators (with augmentation this time - again, reduces chances of overfitting).

Then we can fit the model.I'm only going to use 10 epochs now, to save time, but you could run this for a while. This part took me around 60 minutes for 10 epochs.

In [ ]:
from keras import optimizers

# Compile new model
new_model.compile(loss='binary_crossentropy',
              optimizer=optimizers.RMSprop(lr=1e-5),
              metrics=['acc'])

# Set up generators etc
train_datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/train',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')

validation_generator = test_datagen.flow_from_directory(
        '/ML Files/Age_Gender/faces/validation',
        target_size=(150, 150),
        batch_size=batch_size,
        class_mode='binary')


# use less epochs here - only 10
history2 = new_model.fit_generator(
                    train_generator,
                    steps_per_epoch=800 // batch_size, 
                    epochs=10,
                    validation_data = validation_generator,
                    validation_steps = 240 / batch_size)

Plot the scoring history again.

In [21]:
acc = history2.history['acc']
val_acc = history2.history['val_acc']

epochs = range(1, len(acc) + 1)
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.show()

The validation accuracy is still a little bit lower than training accuracy but is at least less volatile now (consistently > 93% after only 10 epochs). The model can be saved to disk, ready to use at a later date.

In [22]:
new_model.save('/ML Files/Age_Gender/final_model.h5')
# Save the weights
new_model.save_weights('/ML Files/Age_Gender/final_weights.h5')

Summary

This model can now be loaded into an R session using the R Keras package when we build the app to produce a life insurance quote. You could repeat the 'option 2' steps for age (which some researchers have actually treated as a classification problem) or even things like height and weight if you had the data, which may be important factors for you to consider when trying to quantify mortality risk from an image. I'm going to use the 'option 1' model to keep things simple. In reality, if I were actually going to use this app to sell life insurance business I would not use this model, because the data it was trained on is unlikely to be representative of the risks I would be taking on (it was trained using images of celebrities on the IMDB website!). However, it's fine for demonstration purposes. In summary, this post has covered:

  • Using a pre-trained model in Keras 'out of the box'
  • Creating a new model from scratch with limited data
  • Extracting features from a pre-trained model in order to build a powerful new gender classifier
  • Fine-tuning VGG-16

Part 2 will cover loading this model into R and creating an app with Shiny.


Comments

comments powered by Disqus