SlideShare a Scribd company logo
 Data-Applied.com: Decision
IntroductionDecision trees let you construct decision modelsThey can be used for forecasting, classification or decisionAt each branch the data is spit based on a particular field of dataDecision trees are constructed using Divide and Conquer techniques
Divide-and-Conquer: Constructing Decision TreesSteps to construct a decision tree recursively:Select an attribute to placed at root node and make one branch for each possible value Repeat the process recursively at each branch, using only those instances that reach the branch If at any time all instances at a node have the classification, stop developing that part of the treeProblem: How to decide which attribute to split on
Divide-and-Conquer: Constructing Decision TreesSteps to find the attribute to split on:We consider all the possible attributes as option and branch them according to different possible valuesNow for each possible attribute value we calculate Information and then find the Information gain for each attribute optionSelect that attribute for division which gives a Maximum Information GainDo this until each branch terminates at an attribute which gives Information = 0
Divide-and-Conquer: Constructing Decision TreesCalculation of Information and Gain:For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 Information(P1, P2 …..Pn)  =  -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPnGain  = Information before division – Information after division
Divide-and-Conquer: Constructing Decision TreesExample:Here we have consider eachattribute individuallyEach is divided into branches according to different possible values Below each branch the number ofclass is marked
Divide-and-Conquer: Constructing Decision TreesCalculations:Using the formulae for Information, initially we haveNumber of instances with class = Yes is 9 Number of instances with class = No is 5So we have P1 = 9/14 and P2 = 5/14Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bitsNow for example lets consider Outlook attribute, we observe the following:
Divide-and-Conquer: Constructing Decision TreesExample Contd.Gain by using Outlook for division        = info([9,5]) – info([2,3],[4,0],[3,2])				                          = 0.940 – 0.693 = 0.247 bitsGain (outlook) = 0.247 bits	Gain (temperature) = 0.029 bits	Gain (humidity) = 0.152 bits	Gain (windy) = 0.048 bitsSo since Outlook gives maximum gain, we will use it for divisionAnd we repeat the steps for Outlook = Sunny and Rainy and stop for 	Overcast since we have Information = 0 for it
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemIf we follow the previously subscribed method, it will always favor an attribute with the largest number of  branchesIn extreme cases it will favor an attribute which has different value for each instance: Identification code
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemInformation for such an attribute is 0info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0It will hence have the maximum gain and will be chosen for branchingBut such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of divisionSo we use gain ratio to compensate for this
Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: Gain ratioGain ratio =  gain/split infoTo calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the classThen we calculate the split info, so for identification code with 14 different values we have:info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807For Outlook we will have the split info:info([5,4,5]) =  -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5  = 1.577
Decision using Data Applied’s web interface
Step1: Selection of data
Step2: SelectingDecision
Step3: Result
Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.

More Related Content

PPTX
WEKA: Algorithms The Basic Methods
PPTX
Data mining
PPT
Decision tree
PDF
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
PPTX
ID3 ALGORITHM
PPTX
Decision tree in artificial intelligence
PPTX
Random forest algorithm
PPTX
Id3 algorithm
WEKA: Algorithms The Basic Methods
Data mining
Decision tree
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
ID3 ALGORITHM
Decision tree in artificial intelligence
Random forest algorithm
Id3 algorithm

What's hot (11)

PPT
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
PPT
Decision tree and random forest
PPTX
WEKA: Practical Machine Learning Tools And Techniques
PDF
K - Nearest neighbor ( KNN )
PDF
Fuzzy c means_realestate_application
PPTX
Fuzzy c means manual work
PDF
ID3 Algorithm & ROC Analysis
PDF
Rough K Means - Numerical Example
PDF
Image Compression
PPT
k Nearest Neighbor
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
Decision tree and random forest
WEKA: Practical Machine Learning Tools And Techniques
K - Nearest neighbor ( KNN )
Fuzzy c means_realestate_application
Fuzzy c means manual work
ID3 Algorithm & ROC Analysis
Rough K Means - Numerical Example
Image Compression
k Nearest Neighbor
Ad

Viewers also liked (20)

PPTX
Data Applied: Similarity
PPTX
LISP: Declarations In Lisp
PPTX
MS Sql Server: Reporting introduction
PPTX
WEKA: Credibility Evaluating Whats Been Learned
PPTX
Matlab: Discrete Linear Systems
PPTX
Matlab Text Files
PPTX
XL-MINER:Partition
PPT
Traffic Skills, Parent & Kids Intro
PPTX
Txomin Hartz Txikia
PPTX
Public Transportation
PPTX
Control Statements in Matlab
PPTX
Quick Look At Classification
PPTX
MS Sql Server: Doing Calculations With Functions
PPTX
SQL Server: BI
PPT
Asha & Beckis Nc Presentation
PPTX
Data Applied: Developer Quicklook
XLSX
PPT
PresentacióN De Quimica
PPTX
Drc 2010 D.J.Pawlik
PPTX
MS SQL SERVER: Programming sql server data mining
Data Applied: Similarity
LISP: Declarations In Lisp
MS Sql Server: Reporting introduction
WEKA: Credibility Evaluating Whats Been Learned
Matlab: Discrete Linear Systems
Matlab Text Files
XL-MINER:Partition
Traffic Skills, Parent & Kids Intro
Txomin Hartz Txikia
Public Transportation
Control Statements in Matlab
Quick Look At Classification
MS Sql Server: Doing Calculations With Functions
SQL Server: BI
Asha & Beckis Nc Presentation
Data Applied: Developer Quicklook
PresentacióN De Quimica
Drc 2010 D.J.Pawlik
MS SQL SERVER: Programming sql server data mining
Ad

Similar to Data Applied:Decision Trees (20)

PPTX
WEKA:Algorithms The Basic Methods
PPTX
DECISION TRESS 2 for machine learning beginners
PPTX
DECISION TRESS for Machine Learning Beginners
PPTX
Lect9 Decision tree
PDF
Decision tree for data mining and computer
PPTX
An algorithm for building
PPT
Data-Mining
PPTX
03-classificationTrees03-classificationTrees.pptx
PPTX
Decision Tree machine learning classification .pptx
PPT
2.2 decision tree
PPTX
Decision Tree data mining algorithm .pptx
PPTX
unit 5 decision tree2.pptx
PPTX
Model Preparation, Evaluation and Feature Engineering.pptx
PDF
Chapter 4.pdf
PPTX
Lt. 5 Pattern Reg.pptx
PDF
Supervised Learning Decision Trees Review of Entropy
PDF
Supervised Learning Decision Trees Machine Learning
PDF
From decision trees to random forests
PPTX
data mining.pptx
PDF
Lecture 5 Decision tree.pdf
WEKA:Algorithms The Basic Methods
DECISION TRESS 2 for machine learning beginners
DECISION TRESS for Machine Learning Beginners
Lect9 Decision tree
Decision tree for data mining and computer
An algorithm for building
Data-Mining
03-classificationTrees03-classificationTrees.pptx
Decision Tree machine learning classification .pptx
2.2 decision tree
Decision Tree data mining algorithm .pptx
unit 5 decision tree2.pptx
Model Preparation, Evaluation and Feature Engineering.pptx
Chapter 4.pdf
Lt. 5 Pattern Reg.pptx
Supervised Learning Decision Trees Review of Entropy
Supervised Learning Decision Trees Machine Learning
From decision trees to random forests
data mining.pptx
Lecture 5 Decision tree.pdf

More from DataminingTools Inc (20)

PPTX
Terminology Machine Learning
PPTX
Techniques Machine Learning
PPTX
Machine learning Introduction
PPTX
Areas of machine leanring
PPTX
AI: Planning and AI
PPTX
AI: Logic in AI 2
PPTX
AI: Logic in AI
PPTX
AI: Learning in AI 2
PPTX
AI: Learning in AI
PPTX
AI: Introduction to artificial intelligence
PPTX
AI: Belief Networks
PPTX
AI: AI & Searching
PPTX
AI: AI & Problem Solving
PPTX
Data Mining: Text and web mining
PPTX
Data Mining: Outlier analysis
PPTX
Data Mining: Mining stream time series and sequence data
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data Mining: Graph mining and social network analysis
PPTX
Data warehouse and olap technology
PPTX
Data Mining: Data processing
Terminology Machine Learning
Techniques Machine Learning
Machine learning Introduction
Areas of machine leanring
AI: Planning and AI
AI: Logic in AI 2
AI: Logic in AI
AI: Learning in AI 2
AI: Learning in AI
AI: Introduction to artificial intelligence
AI: Belief Networks
AI: AI & Searching
AI: AI & Problem Solving
Data Mining: Text and web mining
Data Mining: Outlier analysis
Data Mining: Mining stream time series and sequence data
Data Mining: Mining ,associations, and correlations
Data Mining: Graph mining and social network analysis
Data warehouse and olap technology
Data Mining: Data processing

Recently uploaded (20)

PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPT
Teaching material agriculture food technology
PDF
Mushroom cultivation and it's methods.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
Heart disease approach using modified random forest and particle swarm optimi...
A comparative analysis of optical character recognition models for extracting...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
OMC Textile Division Presentation 2021.pptx
Empathic Computing: Creating Shared Understanding
Assigned Numbers - 2025 - Bluetooth® Document
Diabetes mellitus diagnosis method based random forest with bat algorithm
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
A comparative study of natural language inference in Swahili using monolingua...
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Per capita expenditure prediction using model stacking based on satellite ima...
Group 1 Presentation -Planning and Decision Making .pptx
SOPHOS-XG Firewall Administrator PPT.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
cloud_computing_Infrastucture_as_cloud_p
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Teaching material agriculture food technology
Mushroom cultivation and it's methods.pdf
Spectral efficient network and resource selection model in 5G networks

Data Applied:Decision Trees

  • 2. IntroductionDecision trees let you construct decision modelsThey can be used for forecasting, classification or decisionAt each branch the data is spit based on a particular field of dataDecision trees are constructed using Divide and Conquer techniques
  • 3. Divide-and-Conquer: Constructing Decision TreesSteps to construct a decision tree recursively:Select an attribute to placed at root node and make one branch for each possible value Repeat the process recursively at each branch, using only those instances that reach the branch If at any time all instances at a node have the classification, stop developing that part of the treeProblem: How to decide which attribute to split on
  • 4. Divide-and-Conquer: Constructing Decision TreesSteps to find the attribute to split on:We consider all the possible attributes as option and branch them according to different possible valuesNow for each possible attribute value we calculate Information and then find the Information gain for each attribute optionSelect that attribute for division which gives a Maximum Information GainDo this until each branch terminates at an attribute which gives Information = 0
  • 5. Divide-and-Conquer: Constructing Decision TreesCalculation of Information and Gain:For data: (P1, P2, P3……Pn) such that P1 + P2 + P3 +……. +Pn = 1 Information(P1, P2 …..Pn) = -P1logP1 -P2logP2 – P3logP3 ……… -PnlogPnGain = Information before division – Information after division
  • 6. Divide-and-Conquer: Constructing Decision TreesExample:Here we have consider eachattribute individuallyEach is divided into branches according to different possible values Below each branch the number ofclass is marked
  • 7. Divide-and-Conquer: Constructing Decision TreesCalculations:Using the formulae for Information, initially we haveNumber of instances with class = Yes is 9 Number of instances with class = No is 5So we have P1 = 9/14 and P2 = 5/14Info[9/14, 5/14] = -9/14log(9/14) -5/14log(5/14) = 0.940 bitsNow for example lets consider Outlook attribute, we observe the following:
  • 8. Divide-and-Conquer: Constructing Decision TreesExample Contd.Gain by using Outlook for division = info([9,5]) – info([2,3],[4,0],[3,2]) = 0.940 – 0.693 = 0.247 bitsGain (outlook) = 0.247 bits Gain (temperature) = 0.029 bits Gain (humidity) = 0.152 bits Gain (windy) = 0.048 bitsSo since Outlook gives maximum gain, we will use it for divisionAnd we repeat the steps for Outlook = Sunny and Rainy and stop for Overcast since we have Information = 0 for it
  • 9. Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemIf we follow the previously subscribed method, it will always favor an attribute with the largest number of branchesIn extreme cases it will favor an attribute which has different value for each instance: Identification code
  • 10. Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: The problemInformation for such an attribute is 0info([0,1]) + info([0,1]) + info([0,1]) + …………. + info([0,1]) = 0It will hence have the maximum gain and will be chosen for branchingBut such an attribute is not good for predicting class of an unknown instance nor does it tells anything about the structure of divisionSo we use gain ratio to compensate for this
  • 11. Divide-and-Conquer: Constructing Decision TreesHighly branching attributes: Gain ratioGain ratio = gain/split infoTo calculate split info, for each instance value we just consider the number of instances covered by each attribute value, irrespective of the classThen we calculate the split info, so for identification code with 14 different values we have:info([1,1,1,…..,1]) = -1/14 x log1/14 x 14 = 3.807For Outlook we will have the split info:info([5,4,5]) = -1/5 x log 1/5 -1/4 x log1/4 -1/5 x log 1/5 = 1.577
  • 12. Decision using Data Applied’s web interface
  • 16. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.
  • 17. The tutorials section is free, self-guiding and will not involve any additional support.
  • 18. Visit us at www.dataminingtools.net