in #100DaysOfCode, #100DaysOfData, #100DaysOfTensorflow

Confusion Matrix

For today’s challenge, we will use a confusion matrix instead of accuracy to better understand how our model behaves.

During the next days, I will explore Tensorflow for at least 1 hour per day and post the notebooks, data and models to this repository.

Today’s notebook is available here.

The Confusion Matrix

Accuracy is a fine way to understand performance, but there’s an even better way to do it: the confusion matrix.

This matrix makes easier to compare real values with the predicted ones. So you can spot false negatives and false positives.

To start, let’s reload the packages and data, preprocess it and retrain the model.

# imports
import tensorflow as tf
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

import numpy as np
import pandas as pd

# get data
!wget --no-check-certificate \
    -O /tmp/sentiment.csv https://drive.google.com/uc?id=13ySLC_ue6Umt9RJYSeM2t-V0kCv-4C-P

# define get_data function
def get_data(path):
  data= pd.read_csv(path, index_col=0)
  return data

#get the data
data = get_data('/tmp/sentiment.csv')

# clone package repository
!git clone https://github.com/vallantin/atalaia.git

# navigate to atalaia directory
%cd atalaia

# install packages requirements
!pip install -r requirements.txt

# install package
!python setup.py install

# import it
from atalaia.atalaia import Atalaia

#def pre-process function
def preprocess(panda_series):
  atalaia = Atalaia('en')

  # lower case everyting and remove double spaces
  panda_series = (atalaia.lower_remove_white(t) for t in panda_series)

  # expand contractions
  panda_series = (atalaia.expand_contractions(t) for t in panda_series)

  # remove punctuation
  panda_series = (atalaia.remove_punctuation(t) for t in panda_series)

  # remove numbers
  panda_series = (atalaia.remove_numbers(t) for t in panda_series)

  # remove stopwords
  panda_series = (atalaia.remove_stopwords(t) for t in panda_series)

  # remove excessive spaces
  panda_series = (atalaia.remove_excessive_spaces(t) for t in panda_series)

  return panda_series

# preprocess it
preprocessed_text = preprocess(data.text)

# assign preprocessed texts to dataset
data['text']      = list(preprocessed_text)

# split train/test
# shuffle the dataset
data = data.sample(frac=1)

# separate all classes present on the dataset
classes_dict = {}
for label in [0,1]:
  classes_dict[label] = data[data['sentiment'] == label]

# get 80% of each label
size = int(len(classes_dict[0].text) * 0.8)
X_train = list(classes_dict[0].text[0:size])      + list(classes_dict[1].text[0:size])
X_test  = list(classes_dict[0].text[size:])       + list(classes_dict[1].text[size:])
y_train = list(classes_dict[0].sentiment[0:size]) + list(classes_dict[1].sentiment[0:size])
y_test  = list(classes_dict[0].sentiment[size:])  + list(classes_dict[1].sentiment[size:])

# Convert labels to Numpy arrays
y_train = np.array(y_train)
y_test = np.array(y_test)

# Let's consider the vocab size as the number of words
# that compose 90% of the vocabulary
atalaia    = Atalaia('en')
vocab_size = len(atalaia.representative_tokens(0.9, 
                                               ' '.join(X_train),
oov_tok = "<OOV>"

# start tokenize
tokenizer = Tokenizer(num_words=vocab_size, 

# fit on training
# we don't fit on test because, in real life, our model will have to deal with
# words ir never saw before. So, it makes sense fitting only on training.
# when it finds a word it never saw before, it will assign the 
# <OOV> tag to it.

# get the word index
word_index = tokenizer.word_index

# transform into sequences
# this will assign a index to the tokens present on the corpus
sequences = tokenizer.texts_to_sequences(X_train)

# define max_length 
max_length = 100

# post: pad or truncate after sentence.
# pre: pad or truncate before sentence.

padded = pad_sequences(sequences,

# tokenize and pad test sentences
# thse will be used later on the model for accuracy test
X_test_sequences = tokenizer.texts_to_sequences(X_test)

X_test_padded    = pad_sequences(X_test_sequences,

# create the reverse word index
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])

# create the decoder
def text_decoder(text):
    return ' '.join([reverse_word_index.get(i, '?') for i in text])

# Build network
embedding_dim = 16

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length=max_length),
    tf.keras.layers.Dense(6, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')


# train the model
num_epochs = 10

Instead of checking accuracy, we will create our confusion matrix. First thing to do is to do some predictions. We will reuse the test set, but if you have enough data, you could use a validation set.

# predict
y_pred = model.predict(X_test_padded)

Predictions are in a probabilistic format. But our labels are in a 0 or 1 format. Let’s round our predictions and assume that if they are greater than 0.5, they are positive and if they are lesser than this number, they are negative.

# round
y_pred =[1 if y > 0.5 else 0 for y in y_pred]
# confusion matrix
matrix = tf.math.confusion_matrix(y_test, 

matrix = np.array(matrix)

matrix = pd.DataFrame(matrix, 
                      columns=['Positive (real)', 'Negative (real)'],
                      index=['Positive (predicted)', 'Negative (predicted)'])

Confusion Matrix

And voilà our matrix. The first “cell” shows the good Positive predictions, while the second shows false-positives (they were negative, but model predicted them as positive).

On the second line, the first cell shows the false-negatives (they were positive, but model predicted them as negative), while the next cell shows the real negatives.

False positives and false negatives are a big problem if you are on health industry (but not only…). Imagine you are training a model to detect cancer… a false positive is a headache, but a false negative could cause the death of someone.

What we learned today

Accuracy is a way to inspect model performance, but is not the only one (neither the best one).

Knowing in details how you model “thinks” is even a legal matter in some places. The European Union General Data Protection Regulation guarantees the right of explanation for a decision made by Artificial Intelligence.

So, we better know exactly what’s happening when we deploy a model in Production.

Do you want to connect? It will be a pleasure to discuss Machine Learning with you. Drop me a message on LinkedIn.

Leave a Reply