SlideShare a Scribd company logo
ML Algorithms
Draft V1.0
Do you really understand...
Outline
Supervised Learning:
Linear regression,Logistic regression
Linear Discriminant Analysis
Princeple Component Analysis
Neural network
Support vector machines
K-nearest neighbor
Gradient Boosting Decision Tree
Decision trees(C4.5,ID3,CART), Random Forests
Kernels, Kernel-based PCA
Optimization Methods
Linear Regression
using the dataset below to prediction the price of house
Target is to find a
line which can fit
this points well.
And the euquation
of line is shown
below.
There are infinate number of lines so how can we evaulate the best one?
We introduced the concept of cost function
This is the most common used cost funcion called square error cost function.
When it is more close to zero means that the line fits the dataset better.
Tuning of Parameters of Linear Regression
It can be found that the line is decided by theta0 and theta1, and the cost
fuction is the function of (theta0, theta1).
The target is to find the theta0 and theta1 so that the cost function can get the
grobal minimal or local minimal.
In order to find that ,we must introduce a concept gradient descent.
Gradient Descent: optimization in Linear
Regression
Started from a simple case, when theta0 = 0,
cost function J will be looked like this and we use
the equation below to find the optimal theta1.
Machine Learning Algorithms (Part 1)
Sigmoid Funtions: Activation function
erf: Error Function
“S” shaped functions
Logistic Funtion
Logistic Function
Helpful in:
Logistic Regression
Neural Networks
Logistic Funtion
Features:
Smooth,monotomic, Sigmoid...
Logistic Regression : Motivation
Targrt Function: f(x) = P(+1|x) is with in [0,1]
Logistic Regression: binary classification
Risk Score
Logistic Funtion
Convert a score into an
estimated probability
Logistic Regression :likelihood estimation
Logistic hypothesis:
Error Function: Likelihood Estimation
h(x) = theta(x)
1-h(x) = h(-x)
Logistic Regression:likelihood estimation
y: -1 or +1
Logistic Regression: optimization Details go here.
Logistic Regression: comparision
Some methods
erf: Error Function
Screenshots from here.
Logistic regression & SVM: in practice
1. pca analysis
components range < 25
Logistic regression
2. logistic regression on
{sklearn.linear_model.LogisticRegression} and {sklearn.linear_model.SGDClassifier} with
LogisticRegression
3.conculsion (discuss 1)
(1) logistic regression does not works on this dataset, because of its numerous dimention, even
regulaize it. I assume SVM might better than it.
(2) logistic regression using same likehood function with svm, sklearn.svm.base._fit_liblinear
Random Forest
Bagging(bootstrap of aggregation)
function bag(D,A), For t=1,2,3…...T
1. Request size-N’ dataset D’ by bootstraping with D
2. obtain base gt by A(D’)
Return G = Uniform({gt})
Random Forest
Decision Tree
funcion Tree(D)
if ternimation return base gt else
1. learn b(x) then split D to Dc by b(x)
2. build Gc <- Tree(Dc)
3. return G(x)=∑[bx=c]Gc(x)
Bagging: reduce variance by voting/averaging
Decision Tree: large variance especially in fully-grown tree
Random Forest
Putting them together?1-((t/n)^2+(x/n)^2)...
Random Forest(RF)= bagging + fully-grown C&RT Decision Tree
function bag(D,A), For t=1,2,3…...T
1. Request size-N’ dataset D’ by
bootstraping with D
2. obtain tree gt by Dtree(D’)1. Highly efficient/parallel to learn
2. Inherit pros of C&RT tree
3. eliminate cons of fully-grown C&RT tree
Random Forest
Diversifying by Feature Projection
Recall: data randomness for diversity in bagging
randomly sample N’ of examples
from D
Another possibility for diversity:
randomly sample d’ features from
X
Namely, new dataset d’ is a random subspace of d in features
often d’<<d, efficient when d is large
Re-sample new subspace for each b(x) in C&RT
RF=bagging + random subspace C&RT
Random Forest
projection(combination) with random p so that x=px
Often consider low dimenstional projection:
only d’’ non-zero components in p
includes random subspace as special case:
d’’=1 and p is natural basis
RF = bagging + (random+combination )C&RT
Random Forest
Decision Tree
•The major Decision Tree implementations are:
•ID3, or Iternative Dichotomizer, was the first of three Decision Tree implementations developed by Ross
Quinlan (Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106.)
•CART, or Classification And Regression Trees is often used as a generic acronym for the term Decision Tree,
though it apparently has a more specific meaning. In sum, the CART implementation is very similar to C4.5;
the one notable difference is that CART constructs the tree based on a numerical splitting criterion
recursively applied to the data, whereas C4.5 includes the intermediate step of constructing *rule set*s.
•C4.5, Quinlan's next iteration. The new features (versus ID3) are: (i) accepts both continuous and discrete
features; (ii) handles incomplete data points; (iii) solves over-fitting problem by (very clever) bottom-up
technique usually known as "pruning"; and (iv) different weights can be applied the features that comprise
the training data. Of these, the first three are very important--and i would suggest that any DT
implementation you choose have all three. The fourth (differential weighting) is much less important
Decision Tree
•ID3 and C.4.5 use Shannon Entropy to pick features with the greatest information gain as
nodes. As an example, let's say we would like to classify animals. You would probably ask more
general questions (like "Is it a mammal") first and once confirmed continue with more specific
questions (like "is it a monkey"). In terms of information gain the general questions of our toy
example gives you more information in addition to what you already know (that it is an
animal).
•CART uses Gini Impurity instead. Gini Impurity is a measure of the homogeneity (or "purity")
of the nodes. If all datapoints at one node belong to the same class then this node is
considered "pure". So by minimising the Gini Impurity the decision tree finds the features the
separate the data best.
Decision Tree
•Ensemble
•Build many “base” decision trees, using different subsets of the data.
•Trees can vote on the class of a new input example.
•Accuracy of the ensemble should be better than that of the individual trees.
•Bagging
•Randomly draw a “bootstrap” sample from training data with replacement.
•Apply a classifier to each sample independently.
•Combine the outputs of the classifiers (e.g. majority voting).
•Random Forests
•Ensemble built from multiple tree models, generated using both bagging and subspace
sampling strategies.
Random Forests
•Forest-RI
•Forest-RC
Random Forests - Job Seeker Shane
•在建立每一棵决策树的过程中,有两点需要注意采样与完全分裂。首先是两个随机
采样的过程,random forest对输入的数据要进行行、列的采样。对于行采样,采用
有放回的方式,也就是在采样得到的样本集合中,可能有重复的样本。假设输入样
本为N个,那么采样的样本也为N个。这样使得在训练的时候,每一棵树的输入样本
都不是全部的样本,使得相对不容易出现over-fitting。然后进行列采样,从M个
feature中,选择m个(m << M)。之后就是对采样之后的数据使用完全分裂的方式
建立出决策树,这样决策树的某一个叶子节点要么是无法继续分裂的,要么里面的
所有样本的都是指向的同一个分类。一般很多的决策树算法都一个重要的步骤——
剪枝,但是这里不这样干,由于之前的两个随机采样的过程保证了随机性,所以就
算不剪枝,也不会出现over-fitting。
Decision Tree: in practice•Main prameters:
•max_features: The number of features to consider when looking for the best split:
If int, then consider max_features features at each split
If float, then max_features is a percentage and int(max_features * n_features) features are considered at each split.
if “auto”, then max_features=sqrt(n_features).
If “sqrt”, then max_features=sqrt(n_features).
If “log2”, then max_features=log2(n_features).
If None, then max_features=n_features.
•max_depth: (default=None) The maximum depth of the tree. If None, then nodes are expanded
until all leaves are pure or until all leaves contain less than min_samples_split samples.
•n_estimators=10:The number of trees in the forest.
sklearn.ensemble.Random
ForestClassifier
Principle Component Analysis
Suppose the traning samples are
x-dimisional.
PCA is trying to find a set of y vectors (y<x)
that contain the maxium amount of
variance in the original traning data.
Princeple Component Analysis
How to define “Variance” ?
If we can define V(v), it is
relatively easy to find the
maximum “Variance”.
Princeple Component Analysis
Using cosine rule we can deduce that:
The length of the projection = XtV
where X is the original data, V is the
target vector(direction).
Princeple Component Analysis
C=Covariance Matrix
Princeple Component Analysis
Find ν that maximize
the value of σ.
f(v) : CV
g(v) :
F(v,λ) = f(v) - λ(g(v)-1)
Princeple Component Analysis
Get top n max lambda
until the total variance
(sum of all) meet the
requirement.
The corresponding sets
of ν and the projections
on it will be the result.
From the graph we can tell that:
square variance = eigenvalue
KPCA(Kernel Based PCA)
Calculate the eigenvalue and
eigenvector of covariance
matrix on the virtual space?
φ is implicit, can not diretly calcu-
-late the result.
KPCA(Kernel Based PCA)
k(xi,xj) = coefficient of xi and xj in
higher dimension which can be
calculated.
KPCA(Kernel Based PCA)
Where u and lambda is
the eigenvector and
eigenvalue of K.
We have to make sure
KPCA(Kernel Based PCA)
Note that len(u) = 1
But we still don’t know the
value of
KPCA(Kernel Based PCA)
Nevermind we only have to
get the projection of the data
in the virtual space, which is
KPCA(Kernel Based PCA)
In this case, KPCA is
better than PCA when
the data is not linear.
Kernel: Kernel Motivation
Non-linear problem
->
Linear problem
Kernel Trick: VIPs in Kernel
1) Feature mapping:
2) Feature/Original space: X/H
3) Kernel Function: ?
H: New space
Kernel Trick: Kernel Function
x (x1,x2)
y (y1,y2) (z1,z2,z3)
(v1,v2,v3)
Dot Product in H = (Dot Product in X)^2
→ Kernel Function!
Kernel Trick: Distance of 2 points in H
= k(x,x)-2k(x,x’) + k(x’,x’)
Kernel Trick: Angle of 2 points in H
Kernel Trick: A deep understanding
1) Mapping function is not necessary:
2) Only Kernel function is cool: k(..,...)
3) k? Finitely possitive semi-definite function.
Kernel Matrix/ Gram Matrix
is a finitely possitive semi-definite matrix
Kernel Trick: FPSD Matrix
If k(x,y) = <x,y> is a FPSD function?
M is a FPSD Matrix, if only all non-zero vectors x meets:
Kernel Trick: More..
If k(x,y) is a FPSD function, then there is at least a feature mapping
function phi. Vice versa.
H: New space
Kernel Trick: Some Kernel Functions
Common kernel functions for vector based data.
Linear kernel: K(x, y) = x · y
Polynominal kernel: K(x, y) = (x · y + 1)d
Radial Basis Function:
(The bandwidth sigma can be estimated---Kernel Smoothing)
SVM
In general concept, SVM is a linear classifier to use maximum margin in
feature space doing binary classify.
First part of SVM (find maximum margin)
1. simple example on linear classify
2. margin (functional & geometrical)
3. maximum margin classifier
SVM : a toy example
Introduce a classify function (its will be
proved in Ⅲ or Ⅳ)
SVM: Margin (Functional & Geometrical)
SVM: Margin (Functional & Geometrical)
functional margin
numeric distance:
where x0 = 0
dicide margin by positive or negative:
weakness: the distance is scalar distance, if hyper panel changes, it will not
change, bacause of direction
geometrical margin
in vector space:
because
means:
Then the geometric distance:
# similar to functional margin, we get geometrical margin as follows:
SVM: Margin (Functional & Geometrical)
SVM : Maximum margin Classifier
Due to the definition of SVM is to find the maxmum margin, that is:
Owing to:
Then let
because is a constant in calculating process, for any point achieve:
will not be the point on margin.
http://www.36dsj.com/archives/24596
SVM : Maximum margin Classifier
KNN: Introduction
Instance-based
Distance-based
Supervised learning
Lasy learning: keep all!
Classification
KNN: Main Idea
. Set a distance threshold,
calculate the distances of
the given data point to
others.
. Get the nearest k
neighbours (odd number).
. Majority voting to
determine the class label.
KNN: Distance Metrics
Euclidean Distance
Cosine Similarity
Manhattan distance
/Taxicab Geometry
(Minkowski)
Others: Pearson Correlation (Karl
Pearson),
Kullback–Leibler (KL)
divergence,etc.
KNN: Disadvantages
- Sensitive to the threshold.
- Majority Voting: more is good?
---> Weight = 1/distance
- Works with proper k .
KNN: Advantages
- Easy & lazy.
- Generalized performance: better than Naïve Bayesian Classifier.
NBC: best results
Nearest neighbours: Z
Given point: x
Prob of KNN making errors
1+p <= 2
KNN: Iris dataset in practice
KNN: Code with sklearn
Load directly….
KNN: Practice
Which one is a possible result of KNN?
GBDT
What’s GBDT?
Gradient Boosting Decision Tree
or GBRT
Gradient Boosting Regression Tree
A regression tree instead of a classification tree
Difference between Classification Tree and Regression Tree
Main difference:
Classification trees, as the name implies are used to separate the dataset into classes belonging to the response variable. Usually the
response variable has two classes: Yes or No (1 or 0). If the target variable has more than 2 categories, then a variant of the algorithm,
called C4.5, is used. For binary splits however, the standard CART procedure is used. Thus classification trees are used when the
response or target variable is categorical in nature.
Regression trees are needed when the response variable is numeric or continuous. For example, the predicted price of a consumer
good. Thus regression trees are applicable for prediction type of problems as opposed to classification.
Boosting
Boosting (not Adaboosting):
图自 Machine Learning A Probabilistic Perspective
GB-Gradient Boosting
Boosting,迭代,即通过迭代多棵树来共同决策。这怎么实现呢?难道是每棵树独立训练一遍,
比如A这个人,第一棵树认为是10岁,第二棵树认为是0岁,第三棵树认为是20岁,我们就取
平均值10岁做最终结论?--当然不是!且不说这是投票方法并不是GBDT,只要训练集不变,
独立训练三次的三棵树必定完全相同,这样做完全没有意义。之前说过,GBDT是把所有树的
结论累加起来做最终结论的,所以可以想到每棵树的结论并不是年龄本身,而是年龄的一个累
加量。GBDT的核心就在于,每一棵树学的是之前所有树结论和的残差,这个残差就是一个加
预测值后能得真实值的累加量。比如A的真实年龄是18岁,但第一棵树的预测年龄是12岁,差
了6岁,即残差为6岁。那么在第二棵树里我们把A的年龄设为6岁去学习,如果第二棵树真的
能把A分到6岁的叶子节点,那累加两棵树的结论就是A的真实年龄;如果第二棵树的结论是5
岁,则A仍然存在1岁的残差,第三棵树里A的年龄就变成1岁,继续学。这就是Gradient
Boosting在GBDT中的意义,简单吧。
GDBT: Algorithm
GBDT工作过程实例
实例:http://blog.csdn.net/w28971023/article/details/8240756
关于随机森林和梯度提升树,请选择正确的选项。
1.在随机森林中,中间树互相不独立,而在梯度回归树中,中间树相互独
立。
2.他们都使用随机特征子集来构建中间树。
3.在梯度提升树的情况下我们可以生成并行树,因为树互相独立。
4.梯度提升树在任何数据集上都比随机森林要好。
Blending in Practice
- Blending: make advantage of the whole datasets
OldLee Sharing(2): A simple model of NN
- 192 Features (Input)
- 99 Classes (Output)
- 1 hidden layer
Neural Network (forward & backward)
demonstrated by this link
https://sixunouyang.wordpress.com/2016/11/09/backpropagation-neural-
networkformula/
Linear Discriminant Analysis
Searching for a linear combination of variables (predictors) that
best separates two classes.
Works well in multiple classes tasks(more than 2) when compare
to SVM.
Experiment result:
Linear Discriminant Analysis
v
Linear Discriminant Analysis
Linear Discriminant Analysis
Classify equation:
Linear Discriminant Analysis
For multi-class classification task:
https://users.cs.fiu.edu/~taoli/pub/Li-discrimant.pdf
Optimization: Motivation
What is optimization? Recall the gradient Descent
Optimization: How?
No constraints: Gradient Descent, Newton Method,
Quasi Newton Method (an optimized Newton Method)
Constraints: KKT Conditions.
A Generalized description about optimization...
Optimization: Unconstraint Optimization
Description:
x* is the optimum.
“Newton Method”: is to find the
roots of f(x)=0.
In optimization: derivative f ′ of a twice-differentiable
function f to find the roots of the derivative (solutions to f
′(x)=0), also known as the stationary points of f.
Optimization: Unconstraint Optimization
Description:
x* is the optimum.
“Newton Method”
“Gradient Descend”
Both need iterations.
Optimization: Equality Optimization
min f(x,y)
s.t. g(x,y)=c
Brings in Lagrange Multiplier lambda: (1 constraint,
one multiplier)
Optimization: Why?
Optimization: Equality Optimization E.P.
Optimization: Generalized Optimization Prob
f(x): objective function, loss function,
or cost function
h(x): equality constraint
g(x): inequality constraint
Generalized Lagrange function
Optimization: Generalized Optimization Prob
alpha, beta: Lagrange mulipliers. alpha >=0
L is about alpha and beta, x is constant:
If x not following the constraints:
If x is good:
Optimization: Generalized Optimization Prob
With out constraints!
Optimization: Dual Problem
L is about x, alpha and beta are constants.
Optimization: How to Solve?
Optimization: Inequality Optimization
Karush-Kuhn-Tucker (KKT) Conditions:
Nonlinear Programming
Optimization: Inequality Optimization
Karush-Kuhn-Tucker (KKT) Conditions
s.t. a>=0
KKT Conditions:
1. L(a, b, x) partial
derivative with x, 0;
2. h(x) =0;
3. a*g(x) = 0;
Optimization: KKT Conditions
Karush-Kuhn-Tucker (KKT) Conditions
Thank you all!

More Related Content

What's hot (20)

PPT
Clustering: Large Databases in data mining
ZHAO Sam
 
PDF
Reweighting and Boosting to uniforimty in HEP
arogozhnikov
 
PPTX
Instance based learning
Slideshare
 
PDF
Clustering: A Survey
Raffaele Capaldo
 
PPTX
Machine learning applications in aerospace domain
홍배 김
 
PDF
MLHEP Lectures - day 1, basic track
arogozhnikov
 
PPT
Text categorization
Phuong Nguyen
 
PDF
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
홍배 김
 
PPT
Intro to MATLAB and K-mean algorithm
khalid Shah
 
PDF
MLHEP Lectures - day 2, basic track
arogozhnikov
 
PDF
MLHEP 2015: Introductory Lecture #4
arogozhnikov
 
PDF
Data Science - Part IX - Support Vector Machine
Derek Kane
 
PDF
Radial Basis Function Interpolation
Jesse Bettencourt
 
PDF
机器学习Adaboost
Shocky1
 
PPT
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
PPT
Jörg Stelzer
butest
 
PPTX
Neural network for machine learning
Ujjawal
 
PDF
Iclr2016 vaeまとめ
Deep Learning JP
 
PDF
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
PPTX
Image Classification And Support Vector Machine
Shao-Chuan Wang
 
Clustering: Large Databases in data mining
ZHAO Sam
 
Reweighting and Boosting to uniforimty in HEP
arogozhnikov
 
Instance based learning
Slideshare
 
Clustering: A Survey
Raffaele Capaldo
 
Machine learning applications in aerospace domain
홍배 김
 
MLHEP Lectures - day 1, basic track
arogozhnikov
 
Text categorization
Phuong Nguyen
 
Automatic Gain Tuning based on Gaussian Process Global Optimization (= Bayesi...
홍배 김
 
Intro to MATLAB and K-mean algorithm
khalid Shah
 
MLHEP Lectures - day 2, basic track
arogozhnikov
 
MLHEP 2015: Introductory Lecture #4
arogozhnikov
 
Data Science - Part IX - Support Vector Machine
Derek Kane
 
Radial Basis Function Interpolation
Jesse Bettencourt
 
机器学习Adaboost
Shocky1
 
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
Jörg Stelzer
butest
 
Neural network for machine learning
Ujjawal
 
Iclr2016 vaeまとめ
Deep Learning JP
 
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Image Classification And Support Vector Machine
Shao-Chuan Wang
 

Viewers also liked (20)

PPTX
2nd DL Meetup @ Dublin - Irene
Zihui Li
 
PDF
Representation Learning in Medical Documents
Zihui Li
 
PPT
2010 Spring, Bioinformatics II Presentation
Bongsoo Park
 
PPTX
Q trade presentation
ewig123
 
PDF
Tree advanced
Jinseob Kim
 
PDF
MLHEP 2015: Introductory Lecture #3
arogozhnikov
 
PPTX
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
PDF
Introduction to Some Tree based Learning Method
Honglin Yu
 
PDF
Algorithm1 course 1st slide
Hani Ghazi
 
PDF
International Journal of Algorithms Design and Analysis vol 2 issue 1
JournalsPub www.journalspub.com
 
PDF
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
PPTX
A Movement Recognition Method using LBP
Zihui Li
 
PDF
Bias-variance decomposition in Random Forests
Gilles Louppe
 
PDF
Kaggle "Give me some credit" challenge overview
Adam Pah
 
PPT
Introducción al Análisis y diseño de algoritmos
luzenith_g
 
PDF
Tema e Diplomes- Version përfundimtar
Kristo Xhimo
 
PDF
Machine Learning and Data Mining: 16 Classifiers Ensembles
Pier Luca Lanzi
 
PDF
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Sri Ambati
 
PPTX
Complejidad ppt analisis de algoritmo
Janii Rivera
 
PDF
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
2nd DL Meetup @ Dublin - Irene
Zihui Li
 
Representation Learning in Medical Documents
Zihui Li
 
2010 Spring, Bioinformatics II Presentation
Bongsoo Park
 
Q trade presentation
ewig123
 
Tree advanced
Jinseob Kim
 
MLHEP 2015: Introductory Lecture #3
arogozhnikov
 
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
Introduction to Some Tree based Learning Method
Honglin Yu
 
Algorithm1 course 1st slide
Hani Ghazi
 
International Journal of Algorithms Design and Analysis vol 2 issue 1
JournalsPub www.journalspub.com
 
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
A Movement Recognition Method using LBP
Zihui Li
 
Bias-variance decomposition in Random Forests
Gilles Louppe
 
Kaggle "Give me some credit" challenge overview
Adam Pah
 
Introducción al Análisis y diseño de algoritmos
luzenith_g
 
Tema e Diplomes- Version përfundimtar
Kristo Xhimo
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Pier Luca Lanzi
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Sri Ambati
 
Complejidad ppt analisis de algoritmo
Janii Rivera
 
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Ad

Similar to Machine Learning Algorithms (Part 1) (20)

PPT
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
PPT
Supervised and unsupervised learning
AmAn Singh
 
PPTX
Anomaly detection using deep one class classifier
홍배 김
 
PPT
Lect4
sumit621
 
PDF
Matrix Factorization In Recommender Systems
YONG ZHENG
 
PPTX
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
DOCX
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
PDF
Application of combined support vector machines in process fault diagnosis
Dr.Pooja Jain
 
PPTX
Deep learning from mashine learning AI..
premkumarlive
 
DOCX
8.clustering algorithm.k means.em algorithm
Laura Petrosanu
 
PPTX
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
PDF
Machine Learning Algorithms Introduction.pdf
Vinodh58
 
PDF
Data Mining: Cluster Analysis
Suman Mia
 
PDF
Kaggle Projects Presentation Sawinder Pal Kaur
Sawinder Pal Kaur
 
DOC
Introduction to Support Vector Machines
Silicon Mentor
 
PPT
KNN&DECISION TREE in machine learning and pattern
MohammadZayan4
 
PDF
Data clustering
GARIMA SHAKYA
 
PDF
2012 mdsp pr08 nonparametric approach
nozomuhamada
 
PPT
Artificial Intelligence
butest
 
PDF
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
Supervised and unsupervised learning
AmAn Singh
 
Anomaly detection using deep one class classifier
홍배 김
 
Lect4
sumit621
 
Matrix Factorization In Recommender Systems
YONG ZHENG
 
CLUSTER ANALYSIS ALGORITHMS.pptx
ShwetapadmaBabu1
 
Neural nw k means
Eng. Dr. Dennis N. Mwighusa
 
Application of combined support vector machines in process fault diagnosis
Dr.Pooja Jain
 
Deep learning from mashine learning AI..
premkumarlive
 
8.clustering algorithm.k means.em algorithm
Laura Petrosanu
 
Dimensionality Reduction and feature extraction.pptx
Sivam Chinna
 
Machine Learning Algorithms Introduction.pdf
Vinodh58
 
Data Mining: Cluster Analysis
Suman Mia
 
Kaggle Projects Presentation Sawinder Pal Kaur
Sawinder Pal Kaur
 
Introduction to Support Vector Machines
Silicon Mentor
 
KNN&DECISION TREE in machine learning and pattern
MohammadZayan4
 
Data clustering
GARIMA SHAKYA
 
2012 mdsp pr08 nonparametric approach
nozomuhamada
 
Artificial Intelligence
butest
 
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
Ad

Recently uploaded (20)

PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
PPTX
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
DOCX
Starbucks in the Indian market through its joint venture.
sales480687
 
PDF
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
 
PPTX
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
PDF
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
PPTX
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
PDF
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
PPTX
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
PDF
Kafka Use Cases Real-World Applications
Accentfuture
 
DOCX
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
PPTX
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
PPTX
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
PPTX
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
PPT
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
PPTX
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
PPTX
Mynd company all details what they are doing a
AniketKadam40952
 
PPTX
english9quizw1-240228142338-e9bcf6fd.pptx
rossanthonytan130
 
PDF
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
PPTX
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
PPT2 W1L2.pptx.........................................
palicteronalyn26
 
Starbucks in the Indian market through its joint venture.
sales480687
 
Digital-Transformation-for-Federal-Agencies.pdf.pdf
One Federal Solution
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
Prescriptive Process Monitoring Under Uncertainty and Resource Constraints: A...
Mahmoud Shoush
 
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
A Web Repository System for Data Mining in Drug Discovery
IJDKP
 
Artificial intelligence Presentation1.pptx
SaritaMahajan5
 
Kafka Use Cases Real-World Applications
Accentfuture
 
Cat_Latin_America_in_World_Politics[1].docx
sales480687
 
25 items quiz for practical research 1 in grade 11
leamaydayaganon81
 
ppt somu_Jarvis_AI_Assistant_presen.pptx
MohammedumarFarhan
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
Camuflaje Tipos Características Militar 2025.ppt
e58650738
 
MENU-DRIVEN PROGRAM ON ARUNACHAL PRADESH.pptx
manvi200807
 
Mynd company all details what they are doing a
AniketKadam40952
 
english9quizw1-240228142338-e9bcf6fd.pptx
rossanthonytan130
 
Orchestrating Data Workloads With Airflow.pdf
ssuserae5511
 
Indigo dyeing Presentation (2).pptx as dye
shreeroop1335
 

Machine Learning Algorithms (Part 1)

Editor's Notes

  • #2: https://www.quora.com/Which-statistical-test-to-use-to-quantify-the-similarity-between-two-distributions-when-they-are-not-normal
  • #3: https://en.wikipedia.org/wiki/List_of_machine_learning_concepts#Artificial_neural_network
  • #32: max_features: 选择最适属性时划分的特征不能超过此值。 •当为整数时,即最大特征数;当为小数时,训练集特征数*小数; if “auto”, then max_features=sqrt(n_features). If “sqrt”, then max_features=sqrt(n_features). If “log2”, then max_features=log2(n_features). If None, then max_features=n_features. •max_depth: (default=None)设置树的最大深度,默认为None,这样建树时,会使每一个叶节点只有一个类别,或是达到min_samples_split。 •n_estimators=10:决策树的个数,越多越好,但是性能就会越差,至少100左右(具体数字忘记从哪里来的了)可以达到可接受的性能和误差率。
  • #44: https://www.youtube.com/watch?v=G2NRnh7W4NQ&index=2&list=PLt0SBi1p7xrRKE2us8doqryRou6eDYEOy
  • #50: https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%AE%9A%E7%9F%A9%E9%98%B5
  • #51: https://zh.wikipedia.org/wiki/%E6%AD%A3%E5%AE%9A%E7%9F%A9%E9%98%B5
  • #52: https://www.youtube.com/watch?v=p4t6O9uRX-U&t=3542s&list=PLt0SBi1p7xrRKE2us8doqryRou6eDYEOy&index=1
  • #61: Video: https://www.youtube.com/watch?v=NSu3l4wxcak&list=PLO5e_-yXpYLARtW5NPHTFVYY-xpgwuNNH&index=8 https://www.youtube.com/watch?v=D5elADTz1vk
  • #63: Minkowski: A t Pearson: PCA.. statistics
  • #66: 鸢尾:山鸢尾、变色鸢尾和维吉尼亚鸢尾
  • #67: 关于随机森林和梯度提升树,请选择正确的选项。 1.在随机森林中,中间树互相不独立,而在梯度回归树中,中间树相互独立。 2.他们都使用随机特征子集来构建中间树。 3.在梯度提升树的情况下我们可以生成并行树,因为树互相独立。 4.梯度提升树在任何数据集上都比随机森林要好。 A. 2 B. 1 和 2 C. 1, 3 和 4 D. 2 和 4
  • #75: https://www.youtube.com/watch?v=A-GxGCCAIrg&list=PLXVfgk9fNX2IQOYPmqjqWsNUFl2kpk1U2
  • #78: kernel trick
  • #80: How to calculate covariance matrix: http://blog.sina.com.cn/s/blog_6b7d710b0101l1s7.html
  • #83: https://github.com/IreneZihuiLi/text_pdf (《文本上的算法》) http://www.cnblogs.com/90zeng/p/Lagrange_duality.html 《统计学习方法》李航
  • #84: https://github.com/IreneZihuiLi/text_pdf (《文本上的算法》)
  • #85: https://github.com/IreneZihuiLi/text_pdf (《文本上的算法》)
  • #86: https://github.com/IreneZihuiLi/text_pdf (《文本上的算法》)
  • #95: http://blog.csdn.net/xianlingmao/article/details/7919597 http://www.cnblogs.com/maybe2030/p/4946256.html