SlideShare a Scribd company logo
Deep Learning
Vidyasagar Bhargava
Contents
1. Introduction
2. Why deep learning?
3. Fundamentals of deep learning
4. How deep learning works?
5. Activation Function
6. Train Neural Network
• How to minimize loss or cost?
• How to move in right direction?
• Stochastic Gradient Descent
• How to calculate gradient?
7. Adaptive learning
8. Over fitting
9. Regularization
10. H2O
• Introduction
• H2O’s Deep learning
• Features
• Parameters
• Demo
Introduction
• Deep learning is an enhanced and powerful form of neural network
which is build on several hidden layers(more than 2)
• Since Data comes in many forms sometimes its difficult for linear
methods to detect non-linearity in the data. In fact, many a times
even non-linear algorithm such as GBM, decision tree fails to learn
from data.
• In such cases, a multi layered neural network which creates non-
linear interaction among the features gives the better solution!
Why Deep Learning?
• Neural Network has been around quite long time and only in past few
years they become so popular.
• Deep learning is powerful because it is able to learn powerful feature
representation in unsupervised manner which differs from traditional
machine learning algorithm where we have to manually handcraft
features.
• Handcraft features works in a lot of domain but some domains like
image classification where data is very high dimensional and makes it
difficult to craft feature that are useful for prediction.
• So deep learning takes the approach to take all the data and figuring
out what are the best features.
A Perceptron
Fundamentals of Deep learning
• At the core of any neural network is a perceptron
• In neural network visual… circle represents neuron and line
represents synapse.
• Synapse has very simple job they take value from neuron and
multiply by specific weight and output the result.
• Neurons are little bit complicated.. their job is to add together the
outputs from all the synapses add a bias term and apply the
activation function.
• Activation function allow neural net to model complex non linear
pattern.
Perceptron Forward Pass
Multiply weight
and Inputs • Step 1 (Synapse)
SUM all together
and add bias term
• Step 2 (Neuron)
Applying Non
linearity
• Step3
(Activtaion
function)
Why is bias added ?
• Bias is similar to the intercept term in linear
regression.
• It helps in achieving better prediction by shifting the
decision boundary
Activation function
• At the core of every activation function is non
linearity which transforms output from linear feature
to non linear feature.
• There are many -many activation functions .
• Some common activation functions are :- Sigmoid,
TanH, ReLu,
Importance of Activation function
• Activation functions add non linearity to our
network’s function.
• Non linearity is important because most of our real
world data is non linear.
How to build neural network with
perceptron?
• Perceptron is very basic of neural network. However,
perceptron isn’t powerful enough to work on linearly
separable data.
• Due to this Multi-Layer Perceptron came into existence
• We can add a hidden layer between input layer and output
layer which gives rise to Multi-Layer Perceptron. (MLP)
• To extend MLP to deep neural network simply add more
layers.
Deep Learning Model
• The input layer consist of neurons to equal to number of input
variable in the data.
• The number of neurons in hidden layer depends on user.
• We can find optimum number of neurons in hidden layers using
cross-validating strategy.
Applying Neural Network
• To quantify how good is our neural network we calculates loss i.e.
sum of difference of actual output and predicted output.
• There are lot of loss functions like cross entropy loss, mean square
error etc.
• Loss is represented as J (Θ)
• Our goal is to minimize the loss so that network can predict the
output more accurately.
• Note Θ = W1, W2,....Wn
J (Θ) = 1/N∑i
N loss(f(x(i); Θ), y(i))
argΘ min 1/N∑i
N loss(f(x(i); Θ), y(i))
Train Neural Network
• Now we have J (Θ) we express out loss and we will train our neural
network to minimize the loss.
• So the objective is find the theta that minimizes the loss function.
• Theta is just weights of our network.
• So loss is a function of the model’s parameters.
• To minimize the loss we need to find the lowest point.
How to minimize loss or cost?
• Once the predicted value is computed, it propagates back layer by
layer and recalculates weights associated with each neuron.
• This is known as back propagation.
• The back propagation algorithm optimizes the network performance
using cost function.
• This cost function is minimized using an iterative sequence of steps
called the gradient descent algorithm.
Gradient Descent: How to move in right direction?
• Start at random point and to get to the bottom.
• To reach bottom we calculate the gradient at this point which points
in the direction of maximum ascent…but we want to go downhill so
just multiply by negative 1 and move in opposite direction downward
and we form new point based on that.
• This way we update our parameters to form new loss.
• We can do this over and over again untill we reach the minimum
loss(untill we reach convergence)
Stochastic Gradient Descent algorithm
• Initialize Θ randomly
• For N epochs
o For each training example (x,y):
• Compute loss gradient ∂J(Θ)/∂ Θ
• Update Θ with update rule :
• Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
Note Θ = W1, W2,....Wn
• Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
How to calculate Gradient?
• Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden
node and an output node.
W1 W2 J (Θ)
Lets look at W2. We want to see how W2 changes as our loss changes.
We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule.
i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2.
Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1.
i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1
This is what meant by the idea of back propagating gradients because often times gradient of one parameter
depends on previous parameter so it makes kind of chain of these.
This is idea of back propagation.
X0 h0 O0
∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2
∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
Recap
Now we have good idea of :-
• 1. How to calculate the gradient?
• 2.How to move in the right direction?
• 3.How to minimize our loss?
Loss function can be difficult to optimize
• Update rule
• Learning rate (ἠ) actually represent the step size..i.e. how large a step
should we take with each of our gradient update.
• Next is how to choose learning rate?
Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
How to choose learning rate (ἠ)?
• Small learning rate take lot of time to reach minimum and may be
struck in local minima rather global minima
• Large learning rate can leads to divergence or increase the loss.
• We need to find goldilocks in middle.
• There are couple ways like:- guessing…try whole bunch of different
values and see what gives best result.its very time consuming and not
best use of our resources.
• Do something smarter : Adaptive learning rate which adapt and
change how learning is going.
• We can adapt and change learning rate based on :-
o How fast is learning happening?
o How large are gradients are?
o How large are weights are?
o We can have different learning rate for different parameters.
Adaptive learning Rate Algorithms
• ADAM
• Momentum
• NAG
• Adagrad
• Adadelta (H20)
• RMSProp
For More info on this please check below URL
• http://ruder.io/optimizing-gradient-descent/
Over fitting
• Neural Networks are really powerful models and that are capable of
learning all sorts of features and functions.
• Sometimes they can be too powerful…i.e. they can either over fit or
memorize training examples.
• The idea of over fitting is model performs very well on training set
but when it comes to real world examples model learnt so specific to
training set that it does not apply outside or to test set.
Regularization
• Regularization is how we prevent over fitting in machine learning or
neural networks
• Regularization techniques :-
• Dropout
• Early stop
• Weight Regularization
• …Others
Intro to H2O
• H2O is fast, scalable, open-source machine learning and deep
learning for smarter application.
• Using in-memory compression, H2O handles billions of data
rows in-memory, even with a small cluster.
• H2O includes many common machine learning algorithms,
such as generalized linear modeling (linear regression, logistic
regression, etc.), Naive Bayes, principal components analysis,
time series, k-means clustering, and others.
H2O’s Deep Learning
• H2O ‘s Deep Learning is based on multi-layer feed forward artificial
neural network that is trained with stochastic gradient descent using
back propagation.
• A feed forward artificial neural network (ANN) also known as deep
neural network(DNN) or multi-layer perceptron(MLP) is the most
common type of Deep neural network and the only type that is
supported natively in H2O-3.
• Other types of DNN such as Convolution Neural Network (CNNs) and
Recurrent Neural Network RNN are popular as well.
• MLP works well on transactional data (tabular) ,CNN is great choice
for particularly image classification and RNN for sequential data (e.g.
text, audio, time-series).
• H20 deep water project supports CNNs and RNNs through third party
integration of deep learning libraries such as Tensorflow, Caffe and
MXNet.
Features
Features of H2O’s deep learning are:-
• Multi- threaded distributed parallel computation
• Adaptive learning rate for convergence
• Regularization options like L1 and L2
• Automatic missing value imputation
• Hyper parameter optimization using grid/random search.
• For optimization it uses the Hogwild method which is parallelized
version of SGD.
Parameters
• Hidden – It Specifies the number of hidden layer and number of
neurons in each layer.
• Epochs – It specifies the number of iterations to be done.
• Rate –It specifies the learning rate.
• Activation-It specifies the type of activation function to use.
(In H2O major activation function are TanH, Rectifier and Maxout.)
H2O Deep Learning Demo
Thank You!

More Related Content

PPTX
Support Vector Machines- SVM
PPTX
Hyperparameter Tuning
PDF
Optimizers
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
PDF
Dempster Shafer Theory AI CSE 8th Sem
PPTX
XgBoost.pptx
PDF
Hyperparameter Optimization for Machine Learning
PPT
01 knapsack using backtracking
Support Vector Machines- SVM
Hyperparameter Tuning
Optimizers
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Dempster Shafer Theory AI CSE 8th Sem
XgBoost.pptx
Hyperparameter Optimization for Machine Learning
01 knapsack using backtracking

What's hot (20)

PPTX
Optimization/Gradient Descent
PDF
Linear regression
PDF
ViT (Vision Transformer) Review [CDM]
PDF
Recurrent and Recursive Networks (Part 1)
PDF
Deep Dive into Hyperparameter Tuning
PPTX
ML_ Unit 2_Part_B
PPTX
Chain-of-thought Prompting.pptx
PPTX
Visualization using tSNE
PPTX
HML: Historical View and Trends of Deep Learning
PPTX
Machine Learning lecture6(regularization)
PPTX
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
PDF
"Deep Learning" Chap.6 Convolutional Neural Net
PPT
Supervised and unsupervised learning
PDF
Confusion Matrix
PPTX
An overview of gradient descent optimization algorithms
PDF
Artificial Neural Networks Lect3: Neural Network Learning rules
PDF
Lecture 9 Perceptron
PPTX
Decision Tree Learning
PDF
Introduction to the theory of optimization
Optimization/Gradient Descent
Linear regression
ViT (Vision Transformer) Review [CDM]
Recurrent and Recursive Networks (Part 1)
Deep Dive into Hyperparameter Tuning
ML_ Unit 2_Part_B
Chain-of-thought Prompting.pptx
Visualization using tSNE
HML: Historical View and Trends of Deep Learning
Machine Learning lecture6(regularization)
Deep Learning With Python | Deep Learning And Neural Networks | Deep Learning...
"Deep Learning" Chap.6 Convolutional Neural Net
Supervised and unsupervised learning
Confusion Matrix
An overview of gradient descent optimization algorithms
Artificial Neural Networks Lect3: Neural Network Learning rules
Lecture 9 Perceptron
Decision Tree Learning
Introduction to the theory of optimization
Ad

Similar to Introduction to Deep learning and H2O for beginner's (20)

PPTX
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
PPTX
Deep learning from scratch
PDF
Deep learning - a primer
PDF
Deep learning - a primer
PPTX
Deep Learning Sample Class (Jon Lederman)
PPTX
08 neural networks
PPTX
ML Module 3 Non Linear Learning.pptx
PPTX
Visualization of Deep Learning
PPTX
Introduction to deep Learning Fundamentals
PPTX
Introduction to deep Learning Fundamentals
PPTX
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
PPTX
Activation functions and Training Algorithms for Deep Neural network
PPT
deep learning UNIT-1 Introduction Part-1.ppt
PDF
EssentialsOfMachineLearning.pdf
PPTX
UNIT-II_5.Neural Networks.pptx Artificial neural networks
PPTX
Unit 2 ml.pptx
PPTX
V2.0 open power ai virtual university deep learning and ai introduction
PPTX
Machine Learning, Deep Learning and Data Analysis Introduction
PDF
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
PPTX
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep learning from scratch
Deep learning - a primer
Deep learning - a primer
Deep Learning Sample Class (Jon Lederman)
08 neural networks
ML Module 3 Non Linear Learning.pptx
Visualization of Deep Learning
Introduction to deep Learning Fundamentals
Introduction to deep Learning Fundamentals
Reason To Switch to DNNDNNs excel in handling huge volumes of data (e.g., ima...
Activation functions and Training Algorithms for Deep Neural network
deep learning UNIT-1 Introduction Part-1.ppt
EssentialsOfMachineLearning.pdf
UNIT-II_5.Neural Networks.pptx Artificial neural networks
Unit 2 ml.pptx
V2.0 open power ai virtual university deep learning and ai introduction
Machine Learning, Deep Learning and Data Analysis Introduction
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Deep Learning in Recommender Systems - RecSys Summer School 2017
Ad

Recently uploaded (20)

PPTX
A Complete Guide to Streamlining Business Processes
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Microsoft Core Cloud Services powerpoint
PDF
Transcultural that can help you someday.
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Introduction to Data Science and Data Analysis
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Leprosy and NLEP programme community medicine
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
How to run a consulting project- client discovery
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
A Complete Guide to Streamlining Business Processes
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Microsoft Core Cloud Services powerpoint
Transcultural that can help you someday.
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction to Data Science and Data Analysis
Acceptance and paychological effects of mandatory extra coach I classes.pptx
ISS -ESG Data flows What is ESG and HowHow
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
annual-report-2024-2025 original latest.
Introduction-to-Cloud-ComputingFinal.pptx
Leprosy and NLEP programme community medicine
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Optimise Shopper Experiences with a Strong Data Estate.pdf
How to run a consulting project- client discovery
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx

Introduction to Deep learning and H2O for beginner's

  • 2. Contents 1. Introduction 2. Why deep learning? 3. Fundamentals of deep learning 4. How deep learning works? 5. Activation Function 6. Train Neural Network • How to minimize loss or cost? • How to move in right direction? • Stochastic Gradient Descent • How to calculate gradient? 7. Adaptive learning 8. Over fitting 9. Regularization 10. H2O • Introduction • H2O’s Deep learning • Features • Parameters • Demo
  • 3. Introduction • Deep learning is an enhanced and powerful form of neural network which is build on several hidden layers(more than 2) • Since Data comes in many forms sometimes its difficult for linear methods to detect non-linearity in the data. In fact, many a times even non-linear algorithm such as GBM, decision tree fails to learn from data. • In such cases, a multi layered neural network which creates non- linear interaction among the features gives the better solution!
  • 4. Why Deep Learning? • Neural Network has been around quite long time and only in past few years they become so popular. • Deep learning is powerful because it is able to learn powerful feature representation in unsupervised manner which differs from traditional machine learning algorithm where we have to manually handcraft features. • Handcraft features works in a lot of domain but some domains like image classification where data is very high dimensional and makes it difficult to craft feature that are useful for prediction. • So deep learning takes the approach to take all the data and figuring out what are the best features.
  • 6. Fundamentals of Deep learning • At the core of any neural network is a perceptron • In neural network visual… circle represents neuron and line represents synapse. • Synapse has very simple job they take value from neuron and multiply by specific weight and output the result. • Neurons are little bit complicated.. their job is to add together the outputs from all the synapses add a bias term and apply the activation function. • Activation function allow neural net to model complex non linear pattern.
  • 7. Perceptron Forward Pass Multiply weight and Inputs • Step 1 (Synapse) SUM all together and add bias term • Step 2 (Neuron) Applying Non linearity • Step3 (Activtaion function)
  • 8. Why is bias added ? • Bias is similar to the intercept term in linear regression. • It helps in achieving better prediction by shifting the decision boundary
  • 9. Activation function • At the core of every activation function is non linearity which transforms output from linear feature to non linear feature. • There are many -many activation functions . • Some common activation functions are :- Sigmoid, TanH, ReLu,
  • 10. Importance of Activation function • Activation functions add non linearity to our network’s function. • Non linearity is important because most of our real world data is non linear.
  • 11. How to build neural network with perceptron? • Perceptron is very basic of neural network. However, perceptron isn’t powerful enough to work on linearly separable data. • Due to this Multi-Layer Perceptron came into existence • We can add a hidden layer between input layer and output layer which gives rise to Multi-Layer Perceptron. (MLP) • To extend MLP to deep neural network simply add more layers.
  • 12. Deep Learning Model • The input layer consist of neurons to equal to number of input variable in the data. • The number of neurons in hidden layer depends on user. • We can find optimum number of neurons in hidden layers using cross-validating strategy.
  • 13. Applying Neural Network • To quantify how good is our neural network we calculates loss i.e. sum of difference of actual output and predicted output. • There are lot of loss functions like cross entropy loss, mean square error etc. • Loss is represented as J (Θ) • Our goal is to minimize the loss so that network can predict the output more accurately. • Note Θ = W1, W2,....Wn J (Θ) = 1/N∑i N loss(f(x(i); Θ), y(i)) argΘ min 1/N∑i N loss(f(x(i); Θ), y(i))
  • 14. Train Neural Network • Now we have J (Θ) we express out loss and we will train our neural network to minimize the loss. • So the objective is find the theta that minimizes the loss function. • Theta is just weights of our network. • So loss is a function of the model’s parameters. • To minimize the loss we need to find the lowest point.
  • 15. How to minimize loss or cost? • Once the predicted value is computed, it propagates back layer by layer and recalculates weights associated with each neuron. • This is known as back propagation. • The back propagation algorithm optimizes the network performance using cost function. • This cost function is minimized using an iterative sequence of steps called the gradient descent algorithm.
  • 16. Gradient Descent: How to move in right direction? • Start at random point and to get to the bottom. • To reach bottom we calculate the gradient at this point which points in the direction of maximum ascent…but we want to go downhill so just multiply by negative 1 and move in opposite direction downward and we form new point based on that. • This way we update our parameters to form new loss. • We can do this over and over again untill we reach the minimum loss(untill we reach convergence)
  • 17. Stochastic Gradient Descent algorithm • Initialize Θ randomly • For N epochs o For each training example (x,y): • Compute loss gradient ∂J(Θ)/∂ Θ • Update Θ with update rule : • Θ := Θ – ἠ* ∂J(Θ)/∂ Θ Note Θ = W1, W2,....Wn • Next :- How to calculate gradient part ? i.e. ∂J(Θ)/∂ Θ
  • 18. How to calculate Gradient? • Lets say we have simple neural network that just has three nodes. i.e. a input node ,a hidden node and an output node. W1 W2 J (Θ) Lets look at W2. We want to see how W2 changes as our loss changes. We calculate derivative of J (Θ) w.r.t W2. To do this we apply chain rule. i.e. we find derivative of J (Θ) w.r.t to O0 and multiply by derivative of O0 w.r.t W2. Similarly we can also look for W1 here i.e. calculative derivative of J (Θ) w.r.t W1. i.e. we find derivative of J (Θ) w.r.t O0 multiply by derivative of O0 w.r.t h0 multiply by derivative of h0 w.r.t W1 This is what meant by the idea of back propagating gradients because often times gradient of one parameter depends on previous parameter so it makes kind of chain of these. This is idea of back propagation. X0 h0 O0 ∂J(Θ)/∂ W2 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ W2 ∂J(Θ)/∂ W1 = ∂J(Θ)/∂ O0 * ∂ O0 /∂ h0* ∂ h0 /∂ W1
  • 19. Recap Now we have good idea of :- • 1. How to calculate the gradient? • 2.How to move in the right direction? • 3.How to minimize our loss?
  • 20. Loss function can be difficult to optimize • Update rule • Learning rate (ἠ) actually represent the step size..i.e. how large a step should we take with each of our gradient update. • Next is how to choose learning rate? Θ := Θ – ἠ* ∂J(Θ)/∂ Θ
  • 21. How to choose learning rate (ἠ)? • Small learning rate take lot of time to reach minimum and may be struck in local minima rather global minima • Large learning rate can leads to divergence or increase the loss. • We need to find goldilocks in middle. • There are couple ways like:- guessing…try whole bunch of different values and see what gives best result.its very time consuming and not best use of our resources. • Do something smarter : Adaptive learning rate which adapt and change how learning is going. • We can adapt and change learning rate based on :- o How fast is learning happening? o How large are gradients are? o How large are weights are? o We can have different learning rate for different parameters.
  • 22. Adaptive learning Rate Algorithms • ADAM • Momentum • NAG • Adagrad • Adadelta (H20) • RMSProp For More info on this please check below URL • http://ruder.io/optimizing-gradient-descent/
  • 23. Over fitting • Neural Networks are really powerful models and that are capable of learning all sorts of features and functions. • Sometimes they can be too powerful…i.e. they can either over fit or memorize training examples. • The idea of over fitting is model performs very well on training set but when it comes to real world examples model learnt so specific to training set that it does not apply outside or to test set.
  • 24. Regularization • Regularization is how we prevent over fitting in machine learning or neural networks • Regularization techniques :- • Dropout • Early stop • Weight Regularization • …Others
  • 25. Intro to H2O • H2O is fast, scalable, open-source machine learning and deep learning for smarter application. • Using in-memory compression, H2O handles billions of data rows in-memory, even with a small cluster. • H2O includes many common machine learning algorithms, such as generalized linear modeling (linear regression, logistic regression, etc.), Naive Bayes, principal components analysis, time series, k-means clustering, and others.
  • 26. H2O’s Deep Learning • H2O ‘s Deep Learning is based on multi-layer feed forward artificial neural network that is trained with stochastic gradient descent using back propagation. • A feed forward artificial neural network (ANN) also known as deep neural network(DNN) or multi-layer perceptron(MLP) is the most common type of Deep neural network and the only type that is supported natively in H2O-3. • Other types of DNN such as Convolution Neural Network (CNNs) and Recurrent Neural Network RNN are popular as well. • MLP works well on transactional data (tabular) ,CNN is great choice for particularly image classification and RNN for sequential data (e.g. text, audio, time-series). • H20 deep water project supports CNNs and RNNs through third party integration of deep learning libraries such as Tensorflow, Caffe and MXNet.
  • 27. Features Features of H2O’s deep learning are:- • Multi- threaded distributed parallel computation • Adaptive learning rate for convergence • Regularization options like L1 and L2 • Automatic missing value imputation • Hyper parameter optimization using grid/random search. • For optimization it uses the Hogwild method which is parallelized version of SGD.
  • 28. Parameters • Hidden – It Specifies the number of hidden layer and number of neurons in each layer. • Epochs – It specifies the number of iterations to be done. • Rate –It specifies the learning rate. • Activation-It specifies the type of activation function to use. (In H2O major activation function are TanH, Rectifier and Maxout.)