Deep-Learning As Explained To A 6th Grader

James Ramadan
Analytics Vidhya
Published in
5 min readJan 6, 2020

--

Deep Learning is a type of machine learning and is a popular method these days to make data predictions. To apply deep learning predictive models you need a strong understanding of linear algebra, and to create the models from scratch you need a background in calculus and differential equations.

Lucky for you, today you have me, so we won’t need linear algebra or calculus! I’m going to explain deep learning using only basic algebra.

Let’s get started!

So let’s say we have a data set that shows the relationship between plant growth and time, i.e. we water some plants over the course of several days and record their growth on a chart!

As one might expect, our plants grow taller over time! Now let’s say we have a new plant (shown in blue on day 0) and we want to use our knowledge of past plant growth to predict how tall our plant will grow by day 4. How will we do this?

Well, if we want to keep our guess simple, we assume a linear relationship between height and time, and we project out where the blue dot will be at day 4. From our yellow trend line, it appears our plant will be around 4.5cm on day 4.

Simple enough. This method is called linear regression modeling and was the best way to make predictions prior to machine learning and deep learning.

Now let’s say we watered our plant for the next 4 days and recorded the actual growth results (the blue dots), and then we determined how close our predictions were to the actual height of the plant, and we marked this distance in red.

From the graph above, we see that our predictions did well. The red lines show how far off our predictions were (the prediction errors) at various points in time. The blue dots indicate how tall our plant was at a given time, and the yellow line shows where we expected our plant to be at a given time.

So we are doing pretty good, but can we do better? Almost certainly, by using deep learning.

To build a deep learning prediction model, we make no assumptions about the relationship between height and time. Our deep learning model will learn this relationship on its own. And, in fact, often times in real data sets, the relationship isn’t easily interpretable by humans.

To start, we set plant height predictions to always be 3cm, no matter the time. Don’t worry, our deep learning model will quickly learn this assumption is incorrect. And yes we could have started with any trend, including the linear one above, but we will just keep our predictions as simple as possible, since the deep learning model figures out the best relationship anyway.

At a minimum, we need to specify 3 parameters to build a deep learning model:

  1. An error formula (the loss function) that tells our deep learning model how to quantify prediction error, e.g. the red lines we used above
  2. How many opportunities (the number of epochs) we want to give the model to analyze the training data (e.g. the black dots) before making a “final” prediction at the true relationship between the variables (e.g. the yellow trend line) in order to predict the test data (e.g. the blue dot)
  3. A number (the learning rate) that specifies how aggressively to adjust our predicted relationship when our model makes bad predictions

The first parameter, the loss function, instructs the model what “right” looks like. Otherwise, the model won’t know how to compare the actual results and its predictions to learn how to get better and be smarter.

The second parameter, the number of epochs, relates to our computer’s processing power and our patience. Having more learning opportunities (epochs) means our computer will take longer to run the model but we are more likely to get the best predictions. However, if we set this value TOO high then our model will be too specific to the training data (e.g. the black dots) and it will not accurately predict cases (e.g. the blue dots) that vary slightly from the training data.

The third parameter, the learning rate, relates to the degree our model adjusts its predictions when it is incorrect. If the learning rate is high, our model panics when it is wrong and adjusts greatly. This allows our model to get better predictions faster, but, as a result, the model lacks the precision to get to the best final predictions. If the learning rate is low, the model adjusts its predictions slowly and, while it requires greater computer processing, the model will eventually settle on more accurate predictions. Slow and steady wins the race.

Note that we don’t usually specify the relationship formula for deep learning models, as in our first linear model example. This fact surprised me when I first learned about deep learning. I enjoyed examining the plots and trying to guess which relationship was best (e.g. linear, log, etc.). In deep learning, we give that part up. We trust the model to get it right.

Let’s see how much better a deep learning model can do:

As we can see, for more complex relationships, deep learning models have the potential to predict better results. There is less red line distance, indicating the model has less prediction errors. But we also see how the yellow trend line shape is harder to describe, and therefore that the relationship between height and time is harder to define.

In our example above we only analyzed two variables, height and time. But in more complex models, we may want to include several other variables, e.g. type of plant, temperature, etc. to generate even more accurate predictions for height. As we can anticipate, deep learning will allow us to predict results based on a better variable relationship representation than if we were to assume a more simple relationship, e.g. linear.

The best things we can do to build good deep learning models are to 1) feed high quality data into the model and 2) set up our parameters reasonably. Getting the parameters exactly right is more art than science sometimes, and even the pros use trial-and-error to get them right.

Well that’s the gist. We would need more math and coding skills to actually apply a deep learning model in practice, and therefore we would need to graduate 6th grade, but this explanation provides a good high-level background of deep learning.

The more you know.

Thanks for reading :P

--

--