This is how I create my personal own Soccer AI coach from scratch, using Google’s AutoML with almost no code.

Enrique Gamboa
8 min readDec 7, 2020

--

In this post, I will walk you through the steps I took to create a custom classification edge model, from scratch, using Google’s AutoML Video Intelligence object tracking to create a model that was deployed on a Coral Dev Board to perform inference in real-time.

Here my GitHub repo with this project.

model classification example

My personal soccer AI coach.

I wanted to combine two of my favorite things ⚽ (soccer)+🤖 (technology), and I was looking for an easy solution (end — to end) to create a soccer coach that could identify simple rules, in such a way that these rules can be scaled and deployed to perform actions in real-time.

So far I created a model that has 86% accuracy and 58% Avg precision, that identifies and counts soccer kicks ups (dominadas in Spanish), handballs (mano in Spanish), and ball bounces (botes in Spanish). I decided to experiment with Spanish labels to make it more real.

For now, my trained model which is a Tensor Flow lite graph living in my Coral Dev Board uses its camera to perform inference to classify and count each dominada (kick-up) making a whistle referee sound 🔊 from the speaker each time it detects one.

model deployed on Coral Dev Board example

The results are not perfect but I’m getting there, the model has just been trained with 42 videos, and three labels; “We recommend about 1000 training images per label.”(Google Cloud, 2020). I’m sure that with more time and effort I will archive my AI Soocer goal 🥅 (Not pun intended).

What I used to create the AI:

I summarize the project in the following steps:

Getting training videos: Google AIY Vision (You can use literally any other camera such as your phone or your webcam)

AI modeling: AutoML Video Intelligence Object Tracking

Inference: Coral Dev Board + Coral camera (You can use any other TPU device and camera with Tensorflow installed)

How I’ve got the training videos 📹

I captured 42 videos of 30 sec. each, performing kick-ups, that naturally include handballs and bounces from different positions and angles and using different clothes for training. As mentioned you can use any webcam to capture the training videos, I recommend to start with short videos to make easier the labeling (You know what I mean in the next section).

sample of 5 out of 42 training videos

After getting all the training, I tried to split my videos in the following way: 65% for training, 25% for testing, so I went to my GCP Cloud Storage and stored my videos like that:

gs://dominadas/VideoML/train/{myVideos}.mp4
gs://dominadas/VideoML/test/{myVideos}.mp4

How I create the model 🔧

Choosing the right tool to perform your vision goal is never easy especially in the world of AI. Google offers an end to end video intelligence object tracking service that helps you creating a classification Tensorflow edge model and requires literally 0 (zero) lines of code to be created ✌🏼.

“AutoML Video Intelligence Object Tracking enables you to train machine learning models to detect and track multiple objects in shots and segments. You can use these models to track objects in your videos according to your own pre-defined, custom labels.”(Google Cloud, 2020).

What I did was just going to my GCP console, and click on Video Intelligence, then select AutoML Video Object Tracking. (Figure 1)

(Figure 1)
(Figure 2)

Then, I hit CREATE DATASET, named my model “dominadas_t2”, to then hit the Video Object Tracking option. (Figure 2)

Then the next interface is to import the training videos. The first step I did was to create a CVS list for each train and test video. These CVS files contain the list of videos and the parameters of the annotations of your videos (just if you already have this), in my case, I left all the parameters blank to label the videos myself.

My CVS list of videos got stored on my Cloud Storage like this: dominadas/VideoML/videoTest_t.csv and dominadas/VideoML/videoTrain_t.csv.

The CVS “videoTrain_t.csv” (which has the same type of annotations as “videoTest_t.csv”), looks like this:

gs://dominadas/VideoML/test/2020–08–17_1452.mp4,,,,,,,,,,,
gs://dominadas/VideoML/test/2020–08–18_1041.mp4,,,,,,,,,,,
gs://dominadas/VideoML/test/2020–08–18_1847.mp4,,,,,,,,,,,
gs://dominadas/VideoML/test/2020–08–17_1456.mp4,,,,,,,,,,,



+ 9 more videos

Finally, I just created another CVS file that contains the CVS list of my training and test videos.

(This was the file I used to import the videos)
gs://dominadas/dominadas_t.csv:

TRAIN,gs://dominadas/VideoML/videoTrain_t.csv
TEST,gs://dominadas/VideoML/videoTest_t.csv

Time to label 🏷

After all my videos got imported, I created the labels I wanted my AI coach to identify. At the Videos tab, I click on ADD NEW LABEL and created them.

Then started the annotation process, which takes the most time, so be patient and module your videos in the smarter way to make it most efficient labeling time, I annotated a total of 2046 labels, having 1086 dominadas(kicks ups), 706 manos (handballs), and 254 botes (bounces) on my 42 videos.

The labeling process is pretty easy, just navigate through your video frames until you find the annotation you want, then create a bounding box on the feature you want to annotate, and right-click to select your label (Figure 3).

(Figure 3)

Google recommends selecting a variety of frames across your videos, meaning that you don't need to track the object you want to identify frame by frame but you should annotate the action or feature you want to label.

Also, they recommend considering position, lighting, and background to your videos “The goal is to make your training data as similar as possible to the data on which your model will make predictions” (Google Cloud, 2020). Finally they recomend to fully annotate your frames, labeling all the interested features in each frame.

Training 👨‍🏫

After all labels were added to my videos, I trained my model. The process is pretty easy and straightforward. Just go to the Train tab, click on “TRAIN NEW MODEL”, then named your model, and select Edge as Model type, to be deployed on your inference device, in my case the Coral Dev Board. (Figure 4)

(Figure 4)

The time on this process will be directly proportional to the complexity of your dataset. In my case, the training for this model took more than 8 hrs less than 14 hrs.

When your training is done, you’ll get an email notification, letting you know that your model training has been done.

Go to TEST & USE tab on the Video Intelligence service, and under “Use your model” download your TF Lite with Edge TPU model (this will give you a .tflite and a .pbtxt) (Figure 5)

(Figure 5)

How I deploy the model

After you get your model, copy the .tflite and .pbtxt to your TPU device, in my case I copy these files from my Mac to my Coral Dev Board using ‘scp’ command.

I copied my TF Lite model with Edge TPU here on my Coral Dev Board to this location:

/home/mendel/dominadas-autoML/object tracking/simpleTPU/4/dominadasT2_1.tflite

/home/mendel/dominadas-autoML/object tracking/simpleTPU/4/labelsT2_1.pbtxt

I tweak a little bit the Coral camera inference example from Google’s automl video ondevice, to count each dominada at 50% (look at line 69). When this happens automatically the payload will create a bounding box in the classification and will play a whistle referee sound.

After tweaking the python file to run the model with the Coral camera, I save the file and run it on my Coral Dev Board in this location:

/home/mendel/dominadas-autoML/automl-videoOnDevice/examples/coral_Dominadas.py

This is what I use to execute the python client to finally start running inference on the Dev Board and the Coral camera:

python3 /home/mendel/dominadas-autoML/automl-videoOnDevice/examples/coral_Dominadas.py  --model "/home/mendel/dominadas-autoML/object tracking/simpleTPU/5/dominadasT2_1.tflite" --labels "/home/mendel/dominadas-autoML/object tracking/simpleTPU/5/labelsT2_1.pbtxt"

This is the model evaluation

The precision of this model is 84.9% (High precision model produces fewer false positives) having a recall of 100% (High recall model produces fewer false negatives). Meaning that this model still having room for improvement. You can see in Figure 6 how the precision-recall graph shows a sudden lack of precision, which is expected to be fixed with more training.

As you can see the Precision and Avg Precision showed how poor this model is doing, the reason is that it has been trained with just a few videos. Deep learning training is everything, the goal 🥅 is to get closer to 100% avg. accuracy which means that the model needs more dominadas and annotation videos.

(Figure 6)

Next steps

my model thinking it caught a handball

The next on this project will be improving the precision of the model by adding more dominadas videos and labaling them.

Based on my experience I notice that the accuracy performs differently by the way the annotation is done. I thought at the beginning that what you need to annotate is the action (in this case a player performing a dominada (kick-up)or a mano (handball). But looks like what you need to label is just the ball, so I will create a new dataset with better labeling methods and for sure more training to get close to 100% accuracy.

After the model gets completely done, I will try to add more features such as analytics to know which leg is more used and what type of dominada is the player performing. Also, I was thinking it would be a good idea to use other models such as PoseNet to measure the ball and legs of the player to perform more analytics.

If you have more ideas of how this project can grow, I’m all ears, please feel free to send me an email.

References

Hart, C. (2020, November 14). Computer Vision Training, the AIY Vision Kit, and Cats. Retrieved December 07, 2020, from https://cogint.ai/custom-vision-training-on-the-aiy-vision-kit/

Preparing your training data | Cloud AutoML Vision Documentation. (n.d.). Retrieved December 03, 2020, from https://cloud.google.com/vision/automl/docs/prepare

Sarma, M. (2020, July 06). Build a Custom Image Classification Model Using Google AutoML. Retrieved December 07, 2020, from https://medium.com/analytics-vidhya/build-a-custom-image-classification-model-using-google-automl-221e45690aef

Villevald, D. (2019, February 22). Transfer Learning Model on Google AIY Vision Kit. Retrieved December 07, 2020, from https://www.hackster.io/dvillevald/transfer-learning-model-on-google-aiy-vision-kit-1aa600

buymeacoffee.com/jegamboafuentes

--

--

Enrique Gamboa

If art is a human abstraction, Artificial Intelligence is the abstraction of humanity 🦾