Testing TensorFlow Lite image classification model

Make sure that your ML model works correctly on mobile app (part 1)

Looking for how to automatically test TensorFlow Lite model on a mobile device? Check the 2nd part of this article.

Building TensorFlow Lite models and deploying them on mobile applications is getting simpler over time. But even with easier to implement libraries and APIs, there are still at least three major steps to accomplish:

  1. Build TensorFlow model,
  2. Convert it to TensorFlow Lite model,
  3. Implement in on the mobile app.

There is a set of information that needs to be passed between those steps – model input/output shape, values format, etc. If you know them (e.g. thanks to visualizing techniques and tools described in this blog post), there is another problem, many software engineers struggle with.

Why the model implemented on a mobile app works differently than its counterpart in a python environment?

Software engineer

In this post, we will try to visualize differences between TensorFlow, TensorFlow Lite and quantized TensorFlow Lite (with post-training quantization) models. This should help us with early models debugging when something goes really wrong.
Here, we will focus only on TensorFlow side. It’s worth to remember, that it doesn’t cover mobile app implementation correctness (e.g. bitmap preprocessing and data transformation). This will be described in one of the future posts.

Important notice – the code presented here and in Colab notebook show just some basic ideas for eye-comparison between TensorFlow and TensorFlow Lite models (on small data batch). It doesn’t check them for speed and any other factor of the performance and doesn’t do any accurate side-by-side cross-comparisons.

TensorFlow model preparation

If you already have TF model as SavedModel, you can skip this paragraph, and go directly to Load TensorFlow model from SavedModel section.

As an example, we will build a simple TensorFlow model that classifies flowers and is built on top of MobileNet v2 thanks to transfer learning technique. The code was taken and inspired by Udacity’s TensorFlow free course, that I highly recommend for everyone who wants to start working with this machine learning framework (no matter if its machine learning engineer, or software engineer implementing ML solutions on client-side).

Here is the model’s structure:

Model for classifying flowers, built on top of MobileNet v2

For training, we will use Keras ImageDataGenerators and example dataset provided by Google:

Accuracy after 10 epochs of training is ~87%. For our needs it is fine 👌.

When model is ready, we will export it to SavedModel format:

Load TensorFlow model from SavedModel

Now, when we have TensorFlow model saved in SavedModel format, let’s load it. If you don’t want to spend time building and training your model, it’s perfectly fine to start from here.

Because our model use custom layer from TensorFlow Hub, we need to point out explicitly its implementation with custom_obiects param.

Check model’s prediction

Now we will take a batch of 32 images from validation dataset and run inference process on the loaded model:

For data visualization, we will use Pandas library. Here is what we can see when we print values via tf_pred_dataframe.head().

Prediction results represented as Pandas Dataframe

Each row here represents prediction results for a separate image (our DataFrame has 32 rows). Each cell contains the label’s confidence for this image. All values in a row sum up to 1 (because the final layer of our model uses Softmax activation function).

We can also print those images and predictions:

Code from above will show:

TensorFlow Lite models

Convert model to TensorFlow Lite

Now we will create two TensorFlow Lite models – non-quantized and quantized, base on the one that we created.
Because of TensorFlow 2.0 nature, we’ll need to convert TensorFlow model into concrete function and then do a conversion to TensorFlow Lite (more about it here).

Because of TensorFlow 2.0’s eager execution, model needs to be converted to Concrete Function before the final conversion to TensorFlow Lite.

In result, we will get two files: flowers.tflite (TensorFlow Lite standard model) and flowers_quant.tflite (TensorFlow Lite quantized model with post-training quantization).

Run TFLite models

Now let’s load TFLite models into Interpreter (tf.lite.Interpreter) representation, so we can run the inference process on it.

By default, interpreter can run inference process on one image (input shape: 1x224x224x3).

Before we run inference, we need to resize input and output tensors, to accept a batch of 32 images:

And finally, run inference and show prediction results:

Again, we put data into Pandas DataFrame.
Here is what we can see for tflite_pred_dataframe.head():

Prediction results from TFLite model represented as Pandas Dataframe

We will do exactly the same operations for the second model – flowers_quant.tflite.
DataFrame preview for it:

Results comparison

Now, what we can do is to concatenate DataFrames from TF, TF Lite, and TF Lite quant models, to have eye-comparison between tables. Inspiration for this code was taken from StackOverflow (link to the answer). 🙂

In result, we can see DataFrame with highlighted rows that are different between TF/TF Lite models.

As we can see, in most cases predictions are different between all models, usually by small factors.
High-confidence predictions between TensorFlow and TensorFlow Lite models are very close to each other (in some cases there are even similar). Quantized model outstands the most, but this is the cost of optimizations (model weights 3-4 times less).

To make prediction results even more readable, let’s simplify DataFrames, to show only the highest-score prediction and the corresponding label.

Now each DataFrame – TF, TFLite and TFLite quant shows only label index and confidence for this label.

Let’s concatenate DataFrames and highlight differences between them:

As you can see, despite differences, TFLite model usually points out the same label for the image (in our validation batch of images). Differences in confidence are usually very small.
Quantized TF Lite model isn’t similarly good here. There are big differences in some confidence scores, and also in some cases, this model points out different label.

Here is a side-by-side comparison for TFLite and TFLite quant models, for our images batch:

Now, It’s up to us to decide whether model size reduction (3-4 times in our case) is worth it.

Next steps

In this blog post, we did a side-by-side comparison between TensorFlow, TensorFlow Lite and quantized TensorFlow Lite models. We could notice small differences between TF and TFLite, and a bit bigger on TFLite quant. But this isn’t everything we can check.

Those models were checked on the same environment (Colab or Jupyter notebook), but problems may occur also further – in mobile app implementation. E.g. in image processing or data transformation.

In future blog posts, we will look closely at what we can do to test TF Lite model implementation correctness directly on a mobile device.

Source code

Source code for this blog post is available on Github (Colab notebook, and mobile application in the future): https://github.com/frogermcs/TFLite-Tester

Notebook with the entire code presented in this post can be run by clicking at the button below:

Looking for how to automatically test TensorFlow Lite model on a mobile device? Check the 2nd part of this article.

Thanks for reading! 🙂
Please share your feedback below. 👇

Leave a Reply

Your email address will not be published. Required fields are marked *