Deploying a Pretrained Image Classifier with TensorFlow Serving

TensorFlow Serving makes it easy to deploy machine learning models while keeping the same server architecture and APIs. TensorFlow Serving is highly integrated with the Tensorflow stack and provides an easy and straightforward way of deploying models.

This article draws inspiration from the official Tensorflow tutorial.

In this article we will be diving into:

Learning how to install TF serving.
Loading a pretrained model that classifies dogs, birds and cats.
Saving it following the conventions needed by TF serving.
Spinning up a web server using TF serving that will accept requests through HTTP.
Interacting with the model via a REST API.
Learning about model versioning.

If you want to follow along, Google Colab is an excellent option.

Imports

Begin by installing the necessary packages.

import os
import zipfile
import subprocess
import numpy as np
import pandas as pd
import tensorflow as tf
from IPython.display import Image, display

Downloading the data

The model we'll be using was originally trained using images from the datasets cats and dogs and caltech birds.

# Download the images
!wget -q https://storage.googleapis.com/mlep-public/course_3/week2/images.zip

# Set a base directory
base_dir = '/tmp/data'

# Unzip images
with zipfile.ZipFile('/content/images.zip', 'r') as my_zip:
  my_zip.extractall(base_dir)

# Save paths for images of each class
dogs_dir = os.path.join(base_dir, 'images/dogs')
cats_dir = os.path.join(base_dir,'images/cats')
birds_dir = os.path.join(base_dir,'images/birds')

# Print number of images for each class
print(f"There are {len(os.listdir(dogs_dir))} images of dogs")
print(f"There are {len(os.listdir(cats_dir))} images of cats")
print(f"There are {len(os.listdir(birds_dir))} images of birds\n\n")

# Look at sample images of each class
print("Sample cat image:")
display(Image(filename=f"{os.path.join(cats_dir, os.listdir(cats_dir)[0])}"))
print("\nSample dog image:")
display(Image(filename=f"{os.path.join(dogs_dir, os.listdir(dogs_dir)[0])}"))
print("\nSample bird image:")
display(Image(filename=f"{os.path.join(birds_dir, os.listdir(birds_dir)[0])}"))

Load a pretrained model

The model classifies images of birds, cats and dogs and has been trained with image augmentation so it yields really good results.

First, download the necessary files:

!wget -q -P /content/model/ https://storage.googleapis.com/mlep-public/course_1/week2/model-augmented/saved_model.pb
!wget -q -P /content/model/variables/ https://storage.googleapis.com/mlep-public/course_1/week2/model-augmented/variables/variables.data-00000-of-00001
!wget -q -P /content/model/variables/ https://storage.googleapis.com/mlep-public/course_1/week2/model-augmented/variables/variables.index

Now, load the model into memory:

model = tf.keras.models.load_model('/content/model')

Saving the model

To load our trained model into TensorFlow Serving we first need to save it in SavedModel format. This will create a protobuf file in a well-defined directory hierarchy, and will include a version number. TensorFlow Serving allows us to select which version of a model, or "servable" we want to use when we make inference requests. Each version will be exported to a different sub-directory under the given path.

# Fetch the Keras session and save the model
# The signature definition is defined by the input and output tensors,
# and stored with the default serving key
import tempfile

MODEL_DIR = tempfile.gettempdir()
version = 1
export_path = os.path.join(MODEL_DIR, str(version))
print(f'export_path = {export_path}\n')


# Save the model
tf.keras.models.save_model(
    model,
    export_path,
    overwrite=True,
    include_optimizer=True,
    save_format=None,
    signatures=None,
    options=None
)

A saved model on disk includes the following files:

assets: a directory including arbitrary files used by the TF graph.
variables: a directory containing information about the training checkpoints of the model.
saved_model.pb: the protobuf file that represents the actual TF program.

Examining the saved model

To examine we can use the command line utility saved_model_cli to look at the MetaGraphDefs (the models) and SignatureDefs (the methods you can call) in our SavedModel.

!saved_model_cli show --dir {export_path} --tag_set serve --signature_def serving_default

This tells us that the model expects our inputs to be of shape (150, 150, 3), which in combination with the use of conv2d layers suggests that this model expects colored images in a resolution of 150 by 150. It also tells us that the output of the model are of shape (3) suggesting a softmax activation with 3 classes.

Preparing the data for inference

It is time to preprocess the test images according to the shape of the data expected by the model. Keras provides an easy way to deal with a wide variety of resolutions with it's ImageDataGenerator.

Using this object you can:

Normalize pixel values.
Standardize image resolutions.
Set a batch size for inference.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Normalize pixel values
test_datagen = ImageDataGenerator(rescale=1./255)

# Point to the directory with the test images
val_gen_no_shuffle = test_datagen.flow_from_directory(
    '/tmp/data/images',
    target_size=(150, 150),
    batch_size=32,
    class_mode='binary',
    shuffle=True)

# Print the label that is assigned to each class
print(f"labels for each class in the test generator are: {val_gen_no_shuffle.class_indices}")

Since this object is a generator, you can get a batch of images and labels using the next function:

# Get a batch of 32 images along with their true label
data_imgs, labels = next(val_gen_no_shuffle)

# Check shapes
print(f"data_imgs has shape: {data_imgs.shape}")
print(f"labels has shape: {labels.shape}")

Serving the model with TensorFlow Serving

Install TensorFlow Serving

!wget 'http://storage.googleapis.com/tensorflow-serving-apt/pool/tensorflow-model-server-universal-2.8.0/t/tensorflow-model-server-universal/tensorflow-model-server-universal_2.8.0_all.deb'
!dpkg -i tensorflow-model-server-universal_2.8.0_all.deb

Start running TensorFlow Serving

The first command creates a directory called tfserving under the temporal directory /tmp. The second command changes your current directory to the one that was just created. And the third command clones the repo in that location.

After running these commands you can return to your previous directory by using cd - or you can simply close this command line window.

Running Tensorflow Serving

Define an env variable with the path to where the model is saved

os.environ["MODEL_DIR"] = MODEL_DIR

Spin up TF serving server

%%bash --bg 
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=animal_classifier \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Making a request to your model in TensorFlow Serving

Since REST expects the data to be in JSON format and JSON does not support custom Python data types such as numpy arrays you first need to convert these arrays into nested lists.

TF serving expects a field called instances which contains the input tensors for the model. To pass in your data to the model you should create a JSON with your data as value for the key instances.

import json

# Convert numpy array to list
data_imgs_list = data_imgs.tolist()

# Create JSON to use in the request
data = json.dumps({"instances": data_imgs_list})

Make REST requests

We'll send a predict request as a POST request to our server's REST endpoint, and pass it the batch of 32 images.

The endpoint that serves the model is located at http://localhost:8501. However this URL still needs some additional parameters to properly handle the request. You should append v1/models/name-of-your-model:predict to it so TF serving knows which model to look for and to perform a predict task.

You should also pass to the request the data containing the list that represents the 32 images along with a headers dictionary that specifies the type of content that will be passed, which is JSON in this case.

After you get a response from the server you can get the predictions out of it by inspecting the predictions field of the JSON that the response returned.

import requests

# Define headers with content-type set to json
headers = {"content-type": "application/json"}

# Capture the response by making a request to the appropiate URL with the appropiate parameters
json_response = requests.post('http://localhost:8501/v1/models/animal_classifier:predict', data=data, headers=headers)

# Parse the predictions out of the response
predictions = json.loads(json_response.text)['predictions']

# Print shape of predictions
print(f"predictions has shape: {np.asarray(predictions).shape}")

The last layer of the model is a softmax function, so it returned a value for each one of the class. To get the actual predictions you need to find the maximum argument:

# Compute argmax
preds = np.argmax(predictions, axis=1)

# Print shape of predictions
print(f"preds has shape: {preds.shape}")

Conclusion

This is a follow-up article on this. This article went deeper into the serving internals of TFS.

Feel free to reach out if you wanna chat about building and deploying AI-powered applications.