How to make a Facial Expression Recognition app?

5 min readJan 17, 2021

Facial Expression — Photo by jurien huggins

Artificial Intelligence is an upcoming technology. With the latest progression on the fields of Deep Learning, we can make it possible for a computer to think and function in a way similar to our human brain. The neurons in our brains are fundamental units that receive input from our environment. They connect with each other trough electric signals.

What if I told you you could make such an arbitrary model?🤔

With a little help of Tensorflow, even you and me could make such a program happen. Today I’ll teach you how to create a Python script that runs a Facial Expression Recognition application. Let’s dive a little deeper!

The training model is a Convolutional Neural Network(CNN). It is an algorithm that uses several layers of filters and neurons working together to distinguish abstract features, by recognising visual patterns. This makes it possible for our system to not just recognise a face, but look at your facial expression to see what emotion you are expressing.

We’ll start by importing every module that we’re going to need.

We’ll use Numpy to change the dimensions from the images inside our arrays. For the image processing and data-generation we’ll import Keras. Our Machine Learning platform will be Tensorflow. And last but not least we’ll import OpenCV to use the webcam and generate a bounding box around faces to recognise.

💡If you want to lean more about these platforms, check their documentation.

To train our model, we’ll need data. The dataset is crucial for every Machine Learning model. This is the programs reference when looking for visual patterns. For our project we used the FER-2013 dataset. But feel free to use any data suitable for the task. Just make sure you fit the coding by your type of images and check you data for any potential biases. You can download the FER-2013 dataset here. Import the directories with data into your project folder and call them within Python.

We’ll have two directories: one for test data and another for validation. The validation directory will be the ‘test’ folder. Don’t get confused by the names. If the name doesn’t suit you, feel free to change the directory names to something else. Just make sure it will be recognisable for someone else.

Next step, is to extract data from the images. This is where the ImageDataGenerator member of Keras will be our saviour! It will generate batches of tensor image data with real-time data augmentation. We’ll call it’s method and give it a rescale parameter of 1./255 to target an output of 0 and 1 instead of 0–255 from our images’ RGB coefficients. It’s possible to add different parameters. Check the documentation for more info. But for now will stick with just one.

Now it’s time to do something with our datagenerators. Our goal is to use them to train and validate our data. By creating a generator for those tasks we’ll do just that.

We’ll give them several parameters. The first is the directories that we created earlier, containing our data. We’ll give it a target_size, which shows our model what size our data has. In our case, we use greyscale images sized 48px by 48px. The next parameter will be the color mode. This is import the clarify, since we’ll be giving our training-model an input shape that will use a third parameter. If we do not specify the color_mode, it will give problems later on. The last parameter just tells our model to use classification and label our data.

Next will be our CNN model itself. It uses several types of layers. Don’t worry if it scares you right now. We’ll be walking trough them one by one:

As you can see we’ve got several Conv2D layers. These are our Convolutional Layers. They each take a set of filters, which is represented by the first number. The first Convolutional Layer takes an extra parameter with the input shape. This is the preferred shape of the images. The activation parameter gets set on relu. This will apply a linear unit activation, providing us with either a 0 or a 1 as the threshold. The filter that matches the image the best gets a 1. All the others will be provided with a 0.

The other duplicate layers are MaxPooling2D and Dropout. The MaxPooling2D layers’ name gives it away; it’s a Maxpooling layer. This is a function that accumulates features from the picture, generated by the filters convolved over the image. This gives the image an abstracted form, and will help prevent overfitting our model. The Dropouts are a way of preventing overfitting as well. These layers will select random neurons during the training and will ignore them. This makes the model less sensitive to the specific weights of neurons and will ultimately generalise our model.

The lower layers are the Flatten and Dense layers. The Flatten layer just simply flattens our input. It doesn’t affect the batch size. The Dense layer is a regular densely-connected Neural Network layer. The first parameter on the first layer is the number of neurons that we want to work with. In our case that’s 1024. The more neurons we use, the more abstract features our model learns. The second layer takes two new parameters. The first is the number of classes or labels it can generate. We’ve got seven emotions to detect:

Angry
Disgusted
Fear
Happy
Neutral
Sad Surprised

So in our case we use 7 classes. The softmax activation covert our real vector into a vector of categorical probabilities. With those 7 emotion labels, we’ve got to do something more. With a Python dictionary we can make 7 key value pairs. With these key value pairs, we can connect our model output to an emotion.

Now to compile our model we’ll use the compile method. Our parameters will be the loss and optimizer function and the built in metrics that we use. The loss function computes the crossentropy loss between the labels and predictions. While our optimizer uses the Adam algorithm. This algorithm realises the benefits of both AdaGrad and RMSProp. Only instead of adapting the parameter learning rates based on the average first moment, it also makes use of the average of the second moments of the gradients. You can learn more about the algorithm here.

The last step for our CNN is to customise our learning algorithm. With the fit method we’ll tell which model needs to be trained and how we want to do it. We’ll specify the generators for training and validating, the verbose and the number of epochs. Each epoch is an iteration of your dataset. Our dataset is compiled into batches of 64, which means (almost) every epoch contains 449 batches. This could differ for you if you use a different dataset. The verbose option just tells the model what to show us.

And there you have it!🥳 Your own Convolution Neural Network, that can be trained into a model. Don’t forget to save the model of course!

Check out the full repository here.

How to make a Facial Expression Recognition app?

Written by Marvin Sernee