Build TensorFlow Lite model with Firebase AutoML Vision Edge

Train first image classification model with Firebase ML Kit

For more than a year now, Firebase – backend platform for mobile and web development, has ML Kit SDK in its portfolio. Thanks to this feature, it is way easier to implement machine learning solutions in mobile apps, regardless of ML skills we have. With APIs like Text Recognition or Image Labeling, we can add those functionalities to our app with a couple of lines of code.
ML Kit also provides a simple way for plugging-in custom machine learning solutions – we provide TensorFlow Lite model, and Firebase is responsible for deploying it into our app – multiplatform (Android and iOS), offline or online (model can be bundled with app on downloaded on-demand in runtime), with a simplified code for implementing an interpreter. 

During Google I/O 2019, there were some new announcements for ML Kit. One of the most interesting things is AutoML Vision Edge – the solution for automatic model training base on your images. 

In this post, we will build a simple classification model for a small subset of GTSRB dataset (traffic signs), preview it’s structure, and put it into the app to test the implementation.
If you want to see how else we can build traffic signs classification model, I encourage to check blog post: Traffic signs classification with retrained MobileNet model.

Firebase limitations

In the Firebase platform, the free plan allows us to train a model for 1 dataset that can contain up to 1000 images. The training process can take no more than 1 hour, and we cannot use more than 3 hours overall per project. Maybe this doesn’t look much, be usually should be enough to validate our initial idea on a small subset of data. And if there are any indicators that model can work, paying a couple of dollars for the training of full-size dataset doesn’t have to be a bad idea. Especially if we have little experience in ML, or don’t have access to a powerful environment.

Dataset preparation

The original GTSRB dataset contains more than 50 000 images, where each of them is classified by one of 40 labels. More information about dataset can be found on the official website. For our demo project, we will use a subset containing 1000 images with 10 different labels (100 images each). 
On the Firebase documentation, you can find more information about requirements for the dataset. In short, it is recommended to have 100 or more images per class. Images (JPGs, PNGs, BMPs or GIFs) can be uploaded as a zip file where directories structure is similar to:

Example data structure for Firebase AutoML (documentation)

Images are organized in directories, where each is named after the label for all files inside it.

Here is the zip file of dataset used in this blog post: link.


When we have the desired dataset, we need to put it to Firebase ML Kit and run training process.
To do this:

  1. Create a new project on Firebase console,
  2. Develop -> ML Kit -> AutoML -> Add dataset,
  3. Setup name, select “Single-label classification”,
  4. Drag and drop .zip with images,
  5. When the import is finished, we can start the training process. At the last step, you will be able to choose some options like model size or accuracy.
    Here we pick higher accuracy, higher latency.

The training process should take a bit more than an hour. When it’s done, we should receive an email informing us about that.

Process of importing and labeling our data in ML Kit Auto ML.

Model overview

When training is finished we should come back to Firebase console. It is a good place to start looking at our model – its performance, accuracy. We can also see there a confusion matrix – how often images were classified correctly, or which labels were most often confused.
This can be helpful when deciding what data we should add to our dataset to increase the model’s performance. 

Example confusion matrix presenting our dataset performance.

Now, the trained model is ready to use. It can be published on Firebase hosting – the app will download it on demand, but it will require Firebase SDK. Or we can download the trained model as .tflite file and then decide how we’re going to handle it in our code (with or without Firebase SDK). While firebase hosting can be great for many reasons (updating model on the fly, A/B testing models), here we will pick the second option – get .tflite file and implement it “by hand”. 
More about publishing or downloading model can be found in Firebase documentation.

Great, We have .tflite file and don’t want to use Firebase SDK just yet – what to do then? We should start with a model investigation – it’s input and output specification. To learn it, we will use the Netron app. If you want to learn more about investigating TensorFlow Lite models, take a look at the blog post: Inspecting TensorFlow Lite image classification model

Here are some information about our model we took from Netron:

  • It is quantized (ML Kit AutoML creates quantized models by default),
  • Input tensor has 1x224x224x3 dimension (224×224 image size with 3 channels pixels),
  • Output tensor has 1×10 dimension (10 labels).
ML Kit AutoML model previewed in Netron

Android application

The last step is about putting TensorFlow Lite model into our mobile app. Something very similar was done in the post Inspecting TensorFlow Lite image classification model (see TFLite-Checker Github repository for the implementation). In the post, we implemented *.tflite models that were using float values for input and output tensors (they were non-quantized models). Now we are going to add a quantized version of TensorFlow Lite model.
All changes reflecting this can be seen in this commit. What they are about?

1 – ByteBuffer for input tensor is now an array of bytes instead of an array of floats:

2 – results are represented as a bytes array. It means that we need to normalize values – convert into floats representing probability in the range [0, 1].

The rest is very similar to the implementation presented in Inspecting TensorFlow Lite image classification model article.

When we run the app, classification results are far from perfect (mixing up speed limit 30 with 80, etc.), but we need to remember that during training we used 1000 from 50 000 available images. It’s pretty likely that the model trained on a full dataset will have much better performance.
Also what is important here, we haven’t written a single line of code for the machine model training (and added some basic code for its implementation in the app). Yet, we still have a working proof of concept. 🙂 

Classification result from the example app.

Source code

Source code for this blog post is available on Github (Android app and Colab notebook):

Thanks for reading! 🙂
Please share your feedback below. 👇

Leave a Reply

Your email address will not be published. Required fields are marked *