The math behind neural networks – Analysing a Dense Layer

Wilame
Marketing Data scientist and Master's student interested in everything concerning Data, Text Mining, and Natural Language Processing. Currently speaking Brazilian Portuguese, French, English, and a tiiiiiiiiny bit of German. Want to connect? Tu peux m'envoyer un message. Pour plus d'informations sur moi, tu peux visiter cette page.

N'oublies pas de partager :

In the article “The difference between Machine Learning and traditional software development – Machine Learning crash course“, I have talked about Dense layers and I also asked you to forget about it for a couple of instants.

Now, it’s time to discuss this subject a little bit more.

In the last article, I referred to what we call a neuron. Neurons are simple units on deep learning networks responsible for analyzing the input data and to produce an output based on weights and bias.

For example, do you still remember the equation that we used to calculate inputs and outputs on the previous article?

It was:

`y = (x * 2) + 3`

When we created a neural network, we actually created a structure containing one neuron that received input and calculated an output. Remember, also, that we only used one layer, right?

But for more complex problems, we can use multiple layers. These layers receive the inputs, calculate them, and transfer the results to the next layers, which will do the same thing until there are no layers.

Hidden layers

The layers between the input and the output layers are called hidden layers.

There’s an excellent course on Udacity called “Intro to TensorFlow for Deep Learning” that explains a little better, and step by step, how a neural network is built.

I have used the image below from this same course, so you can understand what I am talking about, but I strongly advise you to take this course, since it’s also free.

Observe that, in the image, you have 3 layers of neurons. Notice, also, that each neuron performs very simple math on the received information.

So, the output of the neuron a1, for instance, is the sum of the input of each one of the previous neurons multiplied by their weights and by the bias.

The simplest way to describe what’s happening is using the formula:

``y = Σ (w * input) + b``

Where “w” is the weight, “b” is the bias and “y” is the output we are looking for. The weights are responsible for controlling how much influence the input will have on the output.

The biases are a constant, that are not influenced by the previous layer and guarantee that even when all the inputs are zeros there will still be one activation in the neuron.

The weights and the bias are the actual information that is always getting adjusted after each iteration and until this adjustment produces a result that is as close as possible from the real output.

In the end, the training process is the result of a neural network always trying to multiply the input by the weight and sum it to a bias in order to produce a result that is close to the observed output.

Why does it work so well in the problem we tried to solve in the last article? Because our equation was all about multiplying 2 to our input and summing it to 3. Do you remember?

If you go back to the previous article, you will see that our neural network concluded that we were multiplying the input by 2.0123928 and summing the result to 1.3683615. We got these numbers using the method get_weights. It’s not exactly our equation, but it almost works. Check it out:

Of course: the result is not perfect, but as we saw, we have a relatively small error. The error is around 2 and something in most of the cases.

Machine learning math is approximative. The goal is not to have a perfect algorithm, but to be as close as possible to reality.

This happens because, with machine learning, there’s no way to discover if the weights and bias discovered by your model are true… you can only check if they are working or not.

This is where things get complicated: when you are developing a neural network, you have to try with different setups, structures, neurons and layers until you start getting the results that approach the most to reality.

It’s just a trial and error process. And this is basically the way as every machine learning software works. Whether you decide to work with TensorFlow, Keras, or PyTorch, what you get is approximative math that’s used to figure a certain hidden logic for a given problem.

The fact is: you don’t need to understand the math behind all of this (but knowing it won’t either). What’s interesting about this logic is that you start to perceive the very nature of what machine learning is.