SlideShare a Scribd company logo
Microsoft Decision Trees Algorithm
OverviewDecision Trees AlgorithmDMX QueriesData Mining usingDecision TreesModel Content for a Decision Trees ModelDecision Tree ParametersDecision Tree Stored Procedures
Decision Trees AlgorithmThe Microsoft Decision Trees algorithm is a classification and regression algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling of both discrete and continuous attributes.For discrete attributes, the algorithm makes predictions based on the relationships between input columns in a dataset. It uses the values, known as states, of those columns to predict the states of a column that you designate as predictable. For example, in a scenario to predict which customers are likely to purchase a motor bike, if nine out of ten younger customers buy a motor bike, but only two out of ten older customers do so, the algorithm infers that age is a good predictor of the bike purchase.
Decision Trees AlgorithmFor continuous attributes, the algorithm uses linear regression to determine where a decision tree splits.If more than one column is set to predictable, or if the input data contains a nested table that is set to predictable, the algorithm builds a separate decision tree for each predictable column.
DMX QueriesLets understand how to use DMX queries by creating a simple tree model based on the School Plans data set.The table School Plans contains data about 500,000 high school students, including Parent Support, Parent Income, Sex, IQ, and whether or not the student plans to attend School. using the Decision Trees algorithm, you can create a mining model, predicting the School Plans attribute based on the four other attributes.
DMX Queries(Classification)CREATE MINING STRUCTURE SchoolPlans(ID LONG KEY,Sex TEXT DISCRETE,ParentIncome LONG CONTINUOUS,IQ LONG CONTINUOUS,ParentSupport TEXT DISCRETE,SchoolPlans TEXT DISCRETE)WITH HOLDOUT (10 PERCENT)ALTER MINING STRUCTURE SchoolPlansADD MINING MODEL SchoolPlan( ID,Sex,ParentIncome,IQ,ParentSupport,SchoolPlans PREDICT)USING Microsoft Decision TreesModel Creation:
DMX Queries(Classification)INSERT INTO SchoolPlans     (ID, Sex, IQ, ParentSupport,       ParentIncome, SchoolPlans)OPENQUERY(SchoolPlans,     ‘SELECT ID, Sex, IQ, ParentSupport,          ParentIncome, SchoolPlans FROM SchoolPlans’)Training the SchoolPlan Model
DMX Queries(Classification)SELECT t.ID, SchoolPlans.SchoolPlans,        PredictProbability(SchoolPlans) AS [Probability]FROM SchoolPlans         PREDICTION JOIN     OPENQUERY(SchoolPlans,    ‘SELECT ID, Sex, IQ, ParentSupport, ParentIncome    FROM NewStudents’) AS tON SchoolPlans.ParentIncome= t.ParentIncome ANDSchoolPlans.IQ = t.IQ ANDSchoolPlans.Sex= t.Sex ANDSchoolPlans.ParentSupport= t.ParentSupportPredicting the SchoolPlan for a new student.This query returns ID, SchoolPlans, andProbability.
DMX Queries(Classification)SELECT t.ID,        PredictHistogram(SchoolPlans) AS [SchoolPlans]   FROM SchoolPlans           PREDICTION JOIN       OPENQUERY(SchoolPlans,           ‘SELECT ID, Sex, IQ, ParentSupport, ParentIncome      FROM NewStudents’) AS t        ON SchoolPlans.ParentIncome= t.ParentIncome ANDSchoolPlans.IQ = t.IQ ANDSchoolPlans.Sex= t.Sex ANDSchoolPlans.ParentSupport= t.ParentSupportnQuery returns the histogram of the SchoolPlans predictions in the form of a nested table.Result of this query is shown in the next slide.
DMX Queries(Classification)
DMX Queries (Regression)Regression means predicting continuous variables using linear regression formulas based on regressors that you specify.        ALTER MINING STRUCTURE SchoolPlans            ADD MINING MODEL ParentIncome              ( ID,              Gender,               ParentIncome PREDICT,              IQ REGRESSOR,             ParentEncouragement,            SchoolPlans             )        USING Microsoft Decision Trees            INSERT INTO ParentIncomeCreating and training a regression model toPredict ParentIncome using IQ, Sex, ParentSupport, and SchoolPlans. IQ is used as a regressor.
DMX Queries (Regression) SELECT t.StudentID, ParentIncome.ParentIncome,       PredictStdev(ParentIncome) AS DeviationFROM ParentIncome        PREDICTION JOIN             OPENQUERY(SchoolPlans,           ‘SELECT ID, Sex, IQ, ParentSupport,            SchoolPlans FROM NewStudents’) AS t         ON ParentIncome.SchoolPlans = t. SchoolPlans AND          ParentIncome.IQ = t.IQ AND              ParentIncome.Sex = t.Sex AND            ParentIncome.ParentSupport = t. ParentSupportContinuous prediction using a decision tree to predict the ParentIncome for new students and the estimated standard deviation for each prediction.
DMX Queries(Association)CREATE MINING MODEL DanceAssociation        (        ID LONG KEY,        Gender TEXT DISCRETE,         MaritalStatus TEXT DISCRETE,           Shows TABLE PREDICT       (    Show TEXT KEY       )        )       USING Microsoft Decision TreesAn example of an associative trees model built on a Dances data set.
Each Show is      considered an attribute with binary states— existing or missing.
DMX Queries(Association)  INSERT INTO DanceAssociation                 ( ID, Gender, MaritalStatus,                  Shows (SKIP, Show))                     SHAPE                     {             OPENQUERY (DanceSurvey,           ‘SELECT ID, Gender, [Marital Status]             FROM Customers ORDER BY ID’)                }           APPEND              (              {OPENQUERY (DanceSurvey,            ‘SELECT ID, Show         FROM Shows ORDER BY ID’)}            RELATE ID TO ID                )AS ShowsTraining an associative trees modelBecause the model contains a nested table, the training statement involves     the Shape statement.
DMX Queries(Association)Training an associative trees modelSuppose that there is a married male customer who likes the Michael Jackson’s Show.This query  returns the other five Shows this customer is most likely to find appealing.SELECT t.ID,Predict(DanceAssociation.Shows,5, $AdjustedProbability)AS RecommendationFROMDanceAssociationNATURAL PREDICTION JOIN(SELECT ‘101’ AS ID, ‘Male’ AS Gender,‘Married’ AS MaritalStatus,(SELECT ‘Michael Jackson’ AS Show)AS Shows) AS t
Data Mining usingDecision TreesThe most common data mining task for a decision tree is classification  i.e. determining whether or not a set of data belongs to a specific type, or class.The principal idea of a decision tree is to split your data recursively into subsets. The process of evaluating all inputs is then repeated on each subset.When this recursive process is completed, a decision tree is formed.
Data Mining usingDecision TreesDecision trees offer several advantages over other data mining algorithms.Trees are quick to build and easy to interpret. Each node in the tree is clearly labeled in terms of the input attributes, and each path formed from the root to a leaf forms a rule about your target variable. Prediction based on decision trees is efficient.
Model Content for a Decision Trees ModelThe top level is the model node. The children of the model node are its tree root nodes. If a tree model contains a single tree, there is only one node in the second level. The nodes of the other levels are either intermediate nodes (or leaf nodes) of the tree. The probabilities of each predictable attribute state are stored in the distribution row sets.
Model Content for a Decision Trees Model
Interpreting  the Mining Model Content A decision trees model has a single parent node that represents the model and its metadata underneath  which are independent trees that represent the predictable attributes that you select. For example, if you set up your decision tree model to predict whether customers will purchase something, and provide inputs for gender and income, the model would create a single tree for the purchasing attribute, with many branches that divide on conditions related to gender and income.However, if you then add a separate predictable attribute for participation in a customer rewards program, the algorithm will create two separate trees under the parent node. One tree contains the analysis for purchasing, and another tree contains the analysis for the customer rewards program.
Decision Tree ParametersThe tree growth, tree shape, and the input output attribute settings are controlled using these parameters .You can fine-tune your model’s accuracy by adjusting these parameter settings.
Decision Tree ParametersCOMPLEXITY _PENALTYis a floating point number with the range [0,1] which controls how much penalty the algorithm applies to complex trees.When the value of this parameter is set close to 0, there is a lower penalty for the tree growth, and you may see large trees.When its value is set close to 1, the tree growth is penalized heavily, and the resulting trees are relatively small.If there are fewer than 10 input attributes, the value is set to 0.5.if there are more than 100 attributes, the value is set to 0.99. If you have between 10 and 100 input attributes, the value is set to 0.9.
Decision Tree ParametersMINIMUM _SUPPORT is used to specify the minimum size of each node in a tree.For example, if this value is set to 25, any split that would produce a child node containing less than 25 cases is not accepted. The default value for MINIMUM_SUPPORT is 10.SCORE _METHOD is used to specify the method for determining a split score during tree growth. The three possible values for SCORE METHOD are:SCORE METHOD = 1 use an entropy score for tree growth.SCORE METHOD = 2  use the Bayesian with K2 Prior method, meaning it will add a constant for each state of the predictable attribute in a tree node, regardless of the node level of the tree.SCORE METHOD = 3  use the Bayesian Dirichlet Equivalent with Uniform Prior (BDEU) method.
Decision Tree ParametersSPLIT METHOD is used to specify the tree shape(binary or bushy)SPLIT METHOD = 1 means the tree is split only in a binary way. SPLIT METHOD = 2 indicates that the tree should always split completely on each attribute. SPLIT METHOD = 3, the default method the decision tree will automatically choose the better of the previous two methods. MAXIMUM_INPUT_ATTRIBUTES   is a threshold parameter for feature selection.When the number of input attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant input attributes.
Decision Tree ParametersMAXIMUM _0UTPUT_ATTRIBUTESis another threshold parameter for feature selection.When the number of predictable attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant attributes. FORCE_REGRESSOR allows you to override the regressor selection logic in the decision tree algorithm and always use a specified regressor in the regression equations in regression trees. This parameter is typically used in price elasticity models. For example, suppose that you have a model to predict Sales using Price and other      attributes. If you specify FORCE REGESSOR = Price, you get regression formulas using Price and other significant attributes for each node of the tree.
Decision Tree Stored ProceduresSet of system-stored procedures used in the Decision Tree viewer are:CALL System.GetTreeScores(‘MovieAssociation’)
CALL System.DTGetNodes(‘MovieAssociation’)
CALL System.DTGetNodeGraph(‘MovieAssociation’, 60)
CALL System.DTAddNodes(‘MovieAssociation’,‘36;34’,       ‘99;282;20;261;26;201;33;269;30;187’)
Decision Tree Stored ProceduresGetTreeScores is the procedure that the Decision Tree viewer uses to populate the drop-down tree selector. It takes a name of a decision tree model as a parameter and returns a table containing a row for every tree on the model and the following three columns:ATTRIBUTE_NAMEis the name of the tree.NODE_UNIQUE_NAME is the content node representing the root of the tree.MSOLAP_NODE_SCORE is a number representing the amount of information(number of nodes) in the tree.
Decision Tree Stored ProceduresDTGetNodes is used by the decision tree Dependency Network viewer when you click the Add Nodes button. It returns a row for all potential nodes in the dependency network and has the following two columns:NODE UNIQUE NAME1 is an identifier that is unique for the dependency network.NODE CAPTION is the name of the node.

More Related Content

What's hot (17)

[M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization
Andrea Rubio
 
Processes and threads
Processes and threads
Satyamevjayte Haxor
 
Feed forward neural network for sine
Feed forward neural network for sine
ijcsa
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
Yusuke Yamamoto
 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
DataminingTools Inc
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Matrix Factorization
Matrix Factorization
Yusuke Yamamoto
 
The solution of problem of parameterization of the proximity function in ace ...
The solution of problem of parameterization of the proximity function in ace ...
eSAT Journals
 
Ijartes v1-i2-006
Ijartes v1-i2-006
IJARTES
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
Simplilearn
 
Feature selection on boolean symbolic objects
Feature selection on boolean symbolic objects
ijcsity
 
An integrated mechanism for feature selection
An integrated mechanism for feature selection
sai kumar
 
Clustering and Regression using WEKA
Clustering and Regression using WEKA
Vijaya Prabhu
 
Data mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
hyunsung lee
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
Chamin Nalinda Loku Gam Hewage
 
[M4A2] Data Analysis and Interpretation Specialization
[M4A2] Data Analysis and Interpretation Specialization
Andrea Rubio
 
Feed forward neural network for sine
Feed forward neural network for sine
ijcsa
 
Collaborative Filtering 2: Item-based CF
Collaborative Filtering 2: Item-based CF
Yusuke Yamamoto
 
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
DataminingTools Inc
 
Data Science - Part XVII - Deep Learning & Image Processing
Data Science - Part XVII - Deep Learning & Image Processing
Derek Kane
 
04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
The solution of problem of parameterization of the proximity function in ace ...
The solution of problem of parameterization of the proximity function in ace ...
eSAT Journals
 
Ijartes v1-i2-006
Ijartes v1-i2-006
IJARTES
 
Data Science Interview Questions | Data Science Interview Questions And Answe...
Data Science Interview Questions | Data Science Interview Questions And Answe...
Simplilearn
 
Feature selection on boolean symbolic objects
Feature selection on boolean symbolic objects
ijcsity
 
An integrated mechanism for feature selection
An integrated mechanism for feature selection
sai kumar
 
Clustering and Regression using WEKA
Clustering and Regression using WEKA
Vijaya Prabhu
 
Data mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
hyunsung lee
 
Branch And Bound and Beam Search Feature Selection Algorithms
Branch And Bound and Beam Search Feature Selection Algorithms
Chamin Nalinda Loku Gam Hewage
 

Viewers also liked (20)

Bayesian classifiers programmed in sql
Bayesian classifiers programmed in sql
ingenioustech
 
Hyperparameter optimization with approximate gradient
Hyperparameter optimization with approximate gradient
Fabian Pedregosa
 
Portavocía en redes sociales
Portavocía en redes sociales
Muévete en bici por Madrid
 
Public Transportation
Public Transportation
dpapageorge
 
Apresentação Red Advisers
Apresentação Red Advisers
mezkita
 
Data Applied:Decision Trees
Data Applied:Decision Trees
DataminingTools Inc
 
Drc 2010 D.J.Pawlik
Drc 2010 D.J.Pawlik
slrommel
 
MS Sql Server: Reporting introduction
MS Sql Server: Reporting introduction
DataminingTools Inc
 
Bind How To
Bind How To
cntlinux
 
Huidige status van de testtaal TTCN-3
Huidige status van de testtaal TTCN-3
Erik Altena
 
R Datatypes
R Datatypes
DataminingTools Inc
 
MySql:Basics
MySql:Basics
DataminingTools Inc
 
Data Applied:Tree Maps
Data Applied:Tree Maps
DataminingTools Inc
 
How To Make Pb J
How To Make Pb J
spencer shanks
 
Festivals Refuerzo
Festivals Refuerzo
guest9536ef5
 
LISP: Declarations In Lisp
LISP: Declarations In Lisp
DataminingTools Inc
 
Association Rules
Association Rules
DataminingTools Inc
 
XL-Miner: Timeseries
XL-Miner: Timeseries
DataminingTools Inc
 
Asha & Beckis Nc Presentation
Asha & Beckis Nc Presentation
Asha Stremcha
 
Introduction to Data-Applied
Introduction to Data-Applied
DataminingTools Inc
 
Ad

Similar to MS SQL SERVER: Decision trees algorithm (20)

chap4_basic_classification(2).ppt
chap4_basic_classification(2).ppt
ssuserfdf196
 
chap4_basic_classification.ppt
chap4_basic_classification.ppt
BantiParsaniya
 
data mining Module 4.ppt
data mining Module 4.ppt
PremaJain2
 
Data Mining Lecture_10(a).pptx
Data Mining Lecture_10(a).pptx
Subrata Kumer Paul
 
Decision Tree based Classification - ML.ppt
Decision Tree based Classification - ML.ppt
amrita chaturvedi
 
Classification
Classification
AuliyaRahman9
 
Classification Slides and decision tree .ppt
Classification Slides and decision tree .ppt
Kiran119578
 
20120140506004
20120140506004
IAEME Publication
 
Basic Classification.pptx
Basic Classification.pptx
PerumalPitchandi
 
Cluster analysis
Cluster analysis
Mahesh Kaluti
 
Decision-Tree-.pdf techniques and many more
Decision-Tree-.pdf techniques and many more
shalinipriya1692
 
Classification
Classification
Anurag jain
 
Lecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based Learning
momtajhossainmowmoni
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
sathish sak
 
Decision trees for Classification & Regression.pptx
Decision trees for Classification & Regression.pptx
ashdgeek312001
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
sqlserver content
 
Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...
Shaoli Lu
 
Classification Using Decision tree
Classification Using Decision tree
Mohd. Noor Abdul Hamid
 
NN Classififcation Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
cmpt cmpt
 
chap4_basic_classification(2).ppt
chap4_basic_classification(2).ppt
ssuserfdf196
 
chap4_basic_classification.ppt
chap4_basic_classification.ppt
BantiParsaniya
 
data mining Module 4.ppt
data mining Module 4.ppt
PremaJain2
 
Data Mining Lecture_10(a).pptx
Data Mining Lecture_10(a).pptx
Subrata Kumer Paul
 
Decision Tree based Classification - ML.ppt
Decision Tree based Classification - ML.ppt
amrita chaturvedi
 
Classification Slides and decision tree .ppt
Classification Slides and decision tree .ppt
Kiran119578
 
Decision-Tree-.pdf techniques and many more
Decision-Tree-.pdf techniques and many more
shalinipriya1692
 
Lecture_21_22_Classification_Instance-based Learning
Lecture_21_22_Classification_Instance-based Learning
momtajhossainmowmoni
 
Classification: Basic Concepts and Decision Trees
Classification: Basic Concepts and Decision Trees
sathish sak
 
Decision trees for Classification & Regression.pptx
Decision trees for Classification & Regression.pptx
ashdgeek312001
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
sqlserver content
 
Data mining by example - building predictive model using microsoft decision t...
Data mining by example - building predictive model using microsoft decision t...
Shaoli Lu
 
NN Classififcation Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
cmpt cmpt
 
Ad

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine Learning
DataminingTools Inc
 
Techniques Machine Learning
Techniques Machine Learning
DataminingTools Inc
 
Machine learning Introduction
Machine learning Introduction
DataminingTools Inc
 
Areas of machine leanring
Areas of machine leanring
DataminingTools Inc
 
AI: Planning and AI
AI: Planning and AI
DataminingTools Inc
 
AI: Logic in AI 2
AI: Logic in AI 2
DataminingTools Inc
 
AI: Logic in AI
AI: Logic in AI
DataminingTools Inc
 
AI: Learning in AI 2
AI: Learning in AI 2
DataminingTools Inc
 
AI: Learning in AI
AI: Learning in AI
DataminingTools Inc
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
DataminingTools Inc
 
AI: Belief Networks
AI: Belief Networks
DataminingTools Inc
 
AI: AI & Searching
AI: AI & Searching
DataminingTools Inc
 
AI: AI & Problem Solving
AI: AI & Problem Solving
DataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Outlier analysis
Data Mining: Outlier analysis
DataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 
Data Mining: Data processing
Data Mining: Data processing
DataminingTools Inc
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
DataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web mining
DataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
DataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
DataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
DataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technology
DataminingTools Inc
 

Recently uploaded (20)

The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 
The State of Web3 Industry- Industry Report
The State of Web3 Industry- Industry Report
Liveplex
 
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
AudGram Review: Build Visually Appealing, AI-Enhanced Audiograms to Engage Yo...
SOFTTECHHUB
 
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
cnc-drilling-dowel-inserting-machine-drillteq-d-510-english.pdf
AmirStern2
 
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Floods in Valencia: Two FME-Powered Stories of Data Resilience
Safe Software
 
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
“From Enterprise to Makers: Driving Vision AI Innovation at the Extreme Edge,...
Edge AI and Vision Alliance
 
Enabling BIM / GIS integrations with Other Systems with FME
Enabling BIM / GIS integrations with Other Systems with FME
Safe Software
 
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
ENERGY CONSUMPTION CALCULATION IN ENERGY-EFFICIENT AIR CONDITIONER.pdf
Muhammad Rizwan Akram
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Seminar: Evolving Landscape of Post-Quantum Cryptography.pptx
FIDO Alliance
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Oracle Cloud Infrastructure AI Foundations
Oracle Cloud Infrastructure AI Foundations
VICTOR MAESTRE RAMIREZ
 
PyData - Graph Theory for Multi-Agent Integration
PyData - Graph Theory for Multi-Agent Integration
barqawicloud
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Agentic AI: Beyond the Buzz- LangGraph Studio V2
Shashikant Jagtap
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
War_And_Cyber_3_Years_Of_Struggle_And_Lessons_For_Global_Security.pdf
biswajitbanerjee38
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Down the Rabbit Hole – Solving 5 Training Roadblocks
Down the Rabbit Hole – Solving 5 Training Roadblocks
Rustici Software
 
Edge-banding-machines-edgeteq-s-200-en-.pdf
Edge-banding-machines-edgeteq-s-200-en-.pdf
AmirStern2
 

MS SQL SERVER: Decision trees algorithm

  • 2. OverviewDecision Trees AlgorithmDMX QueriesData Mining usingDecision TreesModel Content for a Decision Trees ModelDecision Tree ParametersDecision Tree Stored Procedures
  • 3. Decision Trees AlgorithmThe Microsoft Decision Trees algorithm is a classification and regression algorithm provided by Microsoft SQL Server Analysis Services for use in predictive modeling of both discrete and continuous attributes.For discrete attributes, the algorithm makes predictions based on the relationships between input columns in a dataset. It uses the values, known as states, of those columns to predict the states of a column that you designate as predictable. For example, in a scenario to predict which customers are likely to purchase a motor bike, if nine out of ten younger customers buy a motor bike, but only two out of ten older customers do so, the algorithm infers that age is a good predictor of the bike purchase.
  • 4. Decision Trees AlgorithmFor continuous attributes, the algorithm uses linear regression to determine where a decision tree splits.If more than one column is set to predictable, or if the input data contains a nested table that is set to predictable, the algorithm builds a separate decision tree for each predictable column.
  • 5. DMX QueriesLets understand how to use DMX queries by creating a simple tree model based on the School Plans data set.The table School Plans contains data about 500,000 high school students, including Parent Support, Parent Income, Sex, IQ, and whether or not the student plans to attend School. using the Decision Trees algorithm, you can create a mining model, predicting the School Plans attribute based on the four other attributes.
  • 6. DMX Queries(Classification)CREATE MINING STRUCTURE SchoolPlans(ID LONG KEY,Sex TEXT DISCRETE,ParentIncome LONG CONTINUOUS,IQ LONG CONTINUOUS,ParentSupport TEXT DISCRETE,SchoolPlans TEXT DISCRETE)WITH HOLDOUT (10 PERCENT)ALTER MINING STRUCTURE SchoolPlansADD MINING MODEL SchoolPlan( ID,Sex,ParentIncome,IQ,ParentSupport,SchoolPlans PREDICT)USING Microsoft Decision TreesModel Creation:
  • 7. DMX Queries(Classification)INSERT INTO SchoolPlans (ID, Sex, IQ, ParentSupport, ParentIncome, SchoolPlans)OPENQUERY(SchoolPlans, ‘SELECT ID, Sex, IQ, ParentSupport, ParentIncome, SchoolPlans FROM SchoolPlans’)Training the SchoolPlan Model
  • 8. DMX Queries(Classification)SELECT t.ID, SchoolPlans.SchoolPlans, PredictProbability(SchoolPlans) AS [Probability]FROM SchoolPlans PREDICTION JOIN OPENQUERY(SchoolPlans, ‘SELECT ID, Sex, IQ, ParentSupport, ParentIncome FROM NewStudents’) AS tON SchoolPlans.ParentIncome= t.ParentIncome ANDSchoolPlans.IQ = t.IQ ANDSchoolPlans.Sex= t.Sex ANDSchoolPlans.ParentSupport= t.ParentSupportPredicting the SchoolPlan for a new student.This query returns ID, SchoolPlans, andProbability.
  • 9. DMX Queries(Classification)SELECT t.ID, PredictHistogram(SchoolPlans) AS [SchoolPlans] FROM SchoolPlans PREDICTION JOIN OPENQUERY(SchoolPlans, ‘SELECT ID, Sex, IQ, ParentSupport, ParentIncome FROM NewStudents’) AS t ON SchoolPlans.ParentIncome= t.ParentIncome ANDSchoolPlans.IQ = t.IQ ANDSchoolPlans.Sex= t.Sex ANDSchoolPlans.ParentSupport= t.ParentSupportnQuery returns the histogram of the SchoolPlans predictions in the form of a nested table.Result of this query is shown in the next slide.
  • 11. DMX Queries (Regression)Regression means predicting continuous variables using linear regression formulas based on regressors that you specify. ALTER MINING STRUCTURE SchoolPlans ADD MINING MODEL ParentIncome ( ID, Gender, ParentIncome PREDICT, IQ REGRESSOR, ParentEncouragement, SchoolPlans ) USING Microsoft Decision Trees INSERT INTO ParentIncomeCreating and training a regression model toPredict ParentIncome using IQ, Sex, ParentSupport, and SchoolPlans. IQ is used as a regressor.
  • 12. DMX Queries (Regression) SELECT t.StudentID, ParentIncome.ParentIncome, PredictStdev(ParentIncome) AS DeviationFROM ParentIncome PREDICTION JOIN OPENQUERY(SchoolPlans, ‘SELECT ID, Sex, IQ, ParentSupport, SchoolPlans FROM NewStudents’) AS t ON ParentIncome.SchoolPlans = t. SchoolPlans AND ParentIncome.IQ = t.IQ AND ParentIncome.Sex = t.Sex AND ParentIncome.ParentSupport = t. ParentSupportContinuous prediction using a decision tree to predict the ParentIncome for new students and the estimated standard deviation for each prediction.
  • 13. DMX Queries(Association)CREATE MINING MODEL DanceAssociation ( ID LONG KEY, Gender TEXT DISCRETE, MaritalStatus TEXT DISCRETE, Shows TABLE PREDICT ( Show TEXT KEY ) ) USING Microsoft Decision TreesAn example of an associative trees model built on a Dances data set.
  • 14. Each Show is considered an attribute with binary states— existing or missing.
  • 15. DMX Queries(Association) INSERT INTO DanceAssociation ( ID, Gender, MaritalStatus, Shows (SKIP, Show)) SHAPE { OPENQUERY (DanceSurvey, ‘SELECT ID, Gender, [Marital Status] FROM Customers ORDER BY ID’) } APPEND ( {OPENQUERY (DanceSurvey, ‘SELECT ID, Show FROM Shows ORDER BY ID’)} RELATE ID TO ID )AS ShowsTraining an associative trees modelBecause the model contains a nested table, the training statement involves the Shape statement.
  • 16. DMX Queries(Association)Training an associative trees modelSuppose that there is a married male customer who likes the Michael Jackson’s Show.This query returns the other five Shows this customer is most likely to find appealing.SELECT t.ID,Predict(DanceAssociation.Shows,5, $AdjustedProbability)AS RecommendationFROMDanceAssociationNATURAL PREDICTION JOIN(SELECT ‘101’ AS ID, ‘Male’ AS Gender,‘Married’ AS MaritalStatus,(SELECT ‘Michael Jackson’ AS Show)AS Shows) AS t
  • 17. Data Mining usingDecision TreesThe most common data mining task for a decision tree is classification i.e. determining whether or not a set of data belongs to a specific type, or class.The principal idea of a decision tree is to split your data recursively into subsets. The process of evaluating all inputs is then repeated on each subset.When this recursive process is completed, a decision tree is formed.
  • 18. Data Mining usingDecision TreesDecision trees offer several advantages over other data mining algorithms.Trees are quick to build and easy to interpret. Each node in the tree is clearly labeled in terms of the input attributes, and each path formed from the root to a leaf forms a rule about your target variable. Prediction based on decision trees is efficient.
  • 19. Model Content for a Decision Trees ModelThe top level is the model node. The children of the model node are its tree root nodes. If a tree model contains a single tree, there is only one node in the second level. The nodes of the other levels are either intermediate nodes (or leaf nodes) of the tree. The probabilities of each predictable attribute state are stored in the distribution row sets.
  • 20. Model Content for a Decision Trees Model
  • 21. Interpreting the Mining Model Content A decision trees model has a single parent node that represents the model and its metadata underneath which are independent trees that represent the predictable attributes that you select. For example, if you set up your decision tree model to predict whether customers will purchase something, and provide inputs for gender and income, the model would create a single tree for the purchasing attribute, with many branches that divide on conditions related to gender and income.However, if you then add a separate predictable attribute for participation in a customer rewards program, the algorithm will create two separate trees under the parent node. One tree contains the analysis for purchasing, and another tree contains the analysis for the customer rewards program.
  • 22. Decision Tree ParametersThe tree growth, tree shape, and the input output attribute settings are controlled using these parameters .You can fine-tune your model’s accuracy by adjusting these parameter settings.
  • 23. Decision Tree ParametersCOMPLEXITY _PENALTYis a floating point number with the range [0,1] which controls how much penalty the algorithm applies to complex trees.When the value of this parameter is set close to 0, there is a lower penalty for the tree growth, and you may see large trees.When its value is set close to 1, the tree growth is penalized heavily, and the resulting trees are relatively small.If there are fewer than 10 input attributes, the value is set to 0.5.if there are more than 100 attributes, the value is set to 0.99. If you have between 10 and 100 input attributes, the value is set to 0.9.
  • 24. Decision Tree ParametersMINIMUM _SUPPORT is used to specify the minimum size of each node in a tree.For example, if this value is set to 25, any split that would produce a child node containing less than 25 cases is not accepted. The default value for MINIMUM_SUPPORT is 10.SCORE _METHOD is used to specify the method for determining a split score during tree growth. The three possible values for SCORE METHOD are:SCORE METHOD = 1 use an entropy score for tree growth.SCORE METHOD = 2  use the Bayesian with K2 Prior method, meaning it will add a constant for each state of the predictable attribute in a tree node, regardless of the node level of the tree.SCORE METHOD = 3  use the Bayesian Dirichlet Equivalent with Uniform Prior (BDEU) method.
  • 25. Decision Tree ParametersSPLIT METHOD is used to specify the tree shape(binary or bushy)SPLIT METHOD = 1 means the tree is split only in a binary way. SPLIT METHOD = 2 indicates that the tree should always split completely on each attribute. SPLIT METHOD = 3, the default method the decision tree will automatically choose the better of the previous two methods. MAXIMUM_INPUT_ATTRIBUTES is a threshold parameter for feature selection.When the number of input attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant input attributes.
  • 26. Decision Tree ParametersMAXIMUM _0UTPUT_ATTRIBUTESis another threshold parameter for feature selection.When the number of predictable attributes is greater than this parameter value, feature selection is invoked implicitly to select the most significant attributes. FORCE_REGRESSOR allows you to override the regressor selection logic in the decision tree algorithm and always use a specified regressor in the regression equations in regression trees. This parameter is typically used in price elasticity models. For example, suppose that you have a model to predict Sales using Price and other attributes. If you specify FORCE REGESSOR = Price, you get regression formulas using Price and other significant attributes for each node of the tree.
  • 27. Decision Tree Stored ProceduresSet of system-stored procedures used in the Decision Tree viewer are:CALL System.GetTreeScores(‘MovieAssociation’)
  • 30. CALL System.DTAddNodes(‘MovieAssociation’,‘36;34’, ‘99;282;20;261;26;201;33;269;30;187’)
  • 31. Decision Tree Stored ProceduresGetTreeScores is the procedure that the Decision Tree viewer uses to populate the drop-down tree selector. It takes a name of a decision tree model as a parameter and returns a table containing a row for every tree on the model and the following three columns:ATTRIBUTE_NAMEis the name of the tree.NODE_UNIQUE_NAME is the content node representing the root of the tree.MSOLAP_NODE_SCORE is a number representing the amount of information(number of nodes) in the tree.
  • 32. Decision Tree Stored ProceduresDTGetNodes is used by the decision tree Dependency Network viewer when you click the Add Nodes button. It returns a row for all potential nodes in the dependency network and has the following two columns:NODE UNIQUE NAME1 is an identifier that is unique for the dependency network.NODE CAPTION is the name of the node.
  • 33. Decision Tree Stored ProceduresThe DTGetNodeGraph procedure returns four columns:When a row has NODE TYPE = 1, it contains a description of the nodes and the remaining three columns have the following interpretation:NODE UNIQUE NAME1 contains a unique identifier for the node.NODE UNIQUE NAME2 contains the node caption.When a row has NODE TYPE = 2, it represents a directed edge in the graph and the remaining columns have these interpretations: NODE UNIQUE NAME1 contains the node name of the starting point of the edge.NODE UNIQUE NAME2 contains the node name of the ending point of the edge.MSOLAP NODE SCORE contains the relative weight of the edge.
  • 34. Decision Tree Stored ProceduresDTAddNodesallows you to add new nodes to an existing graph. It takes a model name, a semicolon-separated list of the IDs of nodes you want to add to the graph, and a semicolon-separated list of the IDs of nodes already in the graph. This procedure returns a table similar to the NODE TYPE = 2 section of DTGetNodeGraph, but without the NODE TYPE column. The rows in the result set contain all the edges between the added nodes, and all of the edges between the added nodes and the nodes specified as already in the graph.
  • 35. SummaryDecision Trees Algorithm OverviewDMX QueriesData Mining usingDecision TreesInterpreting the Model Content for a Decision Trees ModelDecision Tree ParametersDecision Tree Stored Procedures
  • 36. Visit more self help tutorialsPick a tutorial of your choice and browse through it at your own pace.The tutorials section is free, self-guiding and will not involve any additional support.Visit us at www.dataminingtools.net