| ‘How do neural nets learn?’ A step by step explanation using the H2O Deep Learning algorithm. Cache Translate Page ||
In my last blogpost about Random Forests I introduced the codecentric.ai Bootcamp. The next part I published was about Neural Networks and Deep Learning. Every video of our bootcamp will have example code and tasks to promote hands-on learning. While the practical parts of the bootcamp will be using Python, below you will find the English R version of this Neural Nets Practical Example, where I explain how neural nets learn and how the concepts and techniques translate to training neural nets in R with the H2O Deep Learning function.
You can find the video on YouTube but as, as before, it is only available in German. Same goes for the slides, which are also currently German only. See the end of this article for the embedded video and slides.
Neural Nets and Deep Learning
Just like Random Forests, neural nets are a method for machine learning and can be used for supervised, unsupervised and reinforcement learning. The idea behind neural nets has already been developed back in the 1940s as a way to mimic how our human brain learns. That’s way neural nets in machine learning are also called ANNs (Artificial Neural Networks).
When we say Deep Learning, we talk about big and complex neural nets, which are able to solve complex tasks, like image or language understanding. Deep Learning has gained traction and success particularly with the recent developments in GPUs and TPUs (Tensor Processing Units), the increase in computing power and data in general, as well as the development of easy-to-use frameworks, like Keras and TensorFlow. We find Deep Learning in our everyday lives, e.g. in voice recognition, computer vision, recommender systems, reinforcement learning and many more.
The easiest type of ANN has only node (also called neuron) and is called perceptron. Incoming data flows into this neuron, where a result is calculated, e.g. by summing up all incoming data. Each of the incoming data points is multiplied with a weight; weights can basically be any number and are used to modify the results that are calculated by a neuron: if we change the weight, the result will change also. Optionally, we can add a so called bias to the data points to modify the results even further.
But how do neural nets learn? Below, I will show with an example that uses common techniques and principles.
First, we will load all the packages we need:
tidyverse for data wrangling and plotting
readr for reading in a csv
h2o for Deep Learning (h2o.init initializes the cluster)
h2o.init(nthreads = -1)
## Connection successful!
## R is connected to the H2O cluster:
## H2O cluster uptime: 3 hours 46 minutes
## H2O cluster timezone: Europe/Berlin
## H2O data parsing timezone: UTC
## H2O cluster version: 22.214.171.124
## H2O cluster version age: 1 month and 16 days
## H2O cluster name: H2O_started_from_R_shiringlander_jpa775
## H2O cluster total nodes: 1
## H2O cluster total memory: 3.16 GB
## H2O cluster total cores: 8
## H2O cluster allowed cores: 8
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
## R Version: R version 3.5.1 (2018-07-02)
The dataset used in this example is a customer churn dataset from Kaggle.
Each row represents a customer, each column contains customer’s attributes
We will load the data from a csv file:
ggplot(aes(x = value)) +
facet_wrap(~ key, scales = "free", ncol = 4) +
## Warning: Removed 11 rows containing non-finite values (stat_density).
… and barcharts for categorical variables.
ggplot(aes(x = value)) +
facet_wrap(~ key, scales = "free", ncol = 3) +
Before we can work with h2o, we need to convert our data into an h2o frame object. Note, that I am also converting character columns to categorical columns, otherwise h2o will ignore them. Moreover, we will need our response variable to be in categorical format in order to perform classification on this data.
mutate_if(is.character, as.factor) %__%
Next, I’ll create a vector of the feature names I want to use for modeling (I am leaving out the customer ID because it doesn’t add useful information about customer churn).