SlideShare a Scribd company logo
EFFICIENT DATA EXTRACTION
USING
ARTIFICIAL INTELLIGENCE
Deepak D. Upadhyay
Outlines
● Introduction to Web Mining For Extraction
● Purpose
● Method 1- Supervised Learning
● Method 2- Unsupervised Learning
● Comparison
Introduction to Web Mining for
Extraction
● Web mining describes the practice, of conservative;
data mining techniques onto the web resources
and has facilitated the further development of these
techniques to consider the specific structures of web
data.
● The analysed web resources contain the actual web
site and the hyperlinks connecting these sites and the
path that online users take on the web to reach a
distinct site.
Continue..
● Web usage mining then refers to the deduction of
useful knowledge from the data inputs. While the input
data are mostly web server logs and other primarily
technically position data, the expected output is an
understanding of user behaviour in the domain of
online data search, online shopping, online
learning etc.
Purpose
● web usage mining that helps to deal with certain web
scaling problems such as user trend analysis of
surfing, traffic flow analysis, distributed control
handling, web traffic management and many more.
● Session tracking and website reorganization,
distributed traffic sharing on distributed servers can be
identified and analysis based on web data can be
possible using concepts of neural network.
Continue..
● Neural network is far different from static networks in
which each node is self-intelligent, hence the
network becomes intelligent. So, web users can use
this network more and more.
Method 1- Supervised Learning
● In supervised learning the task is to automatically
induce a model based on a set of N instances, called
training data.
● This model then will be used to assign labels to new
instances with unknown labels using only the value of
their predictor variables.
● Artificial neuronal network is based on simulating the
structure and behaviour of the biological neuronal
networks.
Two Approach for Web Mining in AI
Approach 1- Neuro-Fuzzy Approach for Web Mining
Approach 2- Reduction of Stages on Neuro-Fuzzy after
Backpropagation implementation
Neuro-Fuzzy Approach for Web
Mining
Backpropagation Implementation
Continue..
● If any Web-mining researches apply this Back
propagations, then can easily obtained best result
than any implemented Web mining techniques
because of top down and bottom-up weights.
● Also using Back Propagation, it is more beneficiary to
minimize the number of steps in Web mining as
compare to neuro-fuzzy approach.
Continue..
● As neuro-fuzzy approach uses five major steps to
produce the Webusage pattern forecast, and Web-
usage data analyzer; named Web-log data collection,
data preprocessing, self-organizing map, Web-usage
data cluster, and fuzzy inference system
● But Back propagations use only three steps as Web-
log data collection, data pre-processing, and Back
propagations itself.
Method 2- Unsupervised Learning
❖ Clustering using SOM
The self-organizing maps (SOM) introduced ]are deemed
as being highly effective as a sophisticated
visualization tool for visualizing high dimensional,
complex data with inherent relationships between the
various features comprising the data.
The SOM‟s output emphasizes the salient features of the
data and subsequently leads to the automatic
formation of clusters of similar data items.
Continue..
● The Self-Organizing Map (SOM) has proven to be one
of the most powerful algorithms in data visualization
and exploration. Application areas include various
fields of science and technology, e.g., complex
industrial processes, telecommunications systems,
document and image databases, and even financial
applications.
● The SOM maps the high- dimensional input vectors
onto a two-dimensional grid of prototype vectors and
orders them.
Continue..
● For a human interpreter, the ordered prototype
vectors are easier to visualize and explore than the
original data. The SOM has been widely implemented
in various software tools, Post-processing the SOM
extracts qualitative or quantitative information of the
data.
Fig - Applying SOM in Data Mining
Table 1. Comparison with respect
to SSE with Different clusters and
cases of K-Means and SOM
Continue..
● K-Means cover more Urls but SOM works
better for larger number of cases. With
increase in data, learning process of SOM
becomes more accurate and we can consider
larger number of clusters. SOM is also efficient
in time as compared to K-Means. Thus we can
conclude that SOM has better performance
than K-Mean
Comparison
● supervised learning is much effective than
unsupervised learning. Previously, unsupervised
extraction used extraction patterns that make
assumptions about the regularity of the structure in
the data. We relax this assumption by exploiting
reference sets to aid the extraction.
Continue..
● SOM used for clustering is much faster and
accurate which helps us further in artificial neural
network mining which going to analyse pattern defined
in the training set and further will be compared many
unorganised testing set. The comparison will go under
the process of pre-processing, classification ,clustering
and analysing.

More Related Content

PDF
Improvement of a method based on hidden markov model for clustering web users
PDF
A vague improved markov model approach for web page prediction
PDF
16-mmap-ml-sigmod
PPTX
Large Graph Mining
PDF
16-model-compare-hilda
PPTX
Who am i
PDF
Progressive duplicate detection
PDF
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)
Improvement of a method based on hidden markov model for clustering web users
A vague improved markov model approach for web page prediction
16-mmap-ml-sigmod
Large Graph Mining
16-model-compare-hilda
Who am i
Progressive duplicate detection
Estimating Query Difficulty for News Prediction Retrieval (poster presentation)

Similar to EFFICIENT DATA EXTRACTION USING ARTIFICIAL INTELLIGENCE (20)

PDF
WebSite Visit Forecasting Using Data Mining Techniques
PDF
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
PDF
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
PDF
A Survey on: Utilizing of Different Features in Web Behavior Prediction
PPTX
Webmining ppt
PDF
A Novel Framework on Web Usage Mining
PDF
Web Mining Research Issues and Future Directions – A Survey
PDF
IRJET- Enhancing Prediction of User Behavior on the Basic of Web Logs
PDF
Applications & Research Topics in Machine Learning
DOCX
1. Web Mining – Web mining is an application of data mining for di.docx
PDF
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
PDF
Internship Presentation.pdf
PPTX
SEMINAR PRESENTATION FOR DATA SCIENCE
PDF
Internet ttraffic monitering anomalous behiviour detection
PDF
Data mining for_java_and_dot_net 2016-17
DOCX
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
DOCX
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
PDF
User Navigation Pattern Prediction from Web Log Data: A Survey
PDF
IRJET- Customer Online Buying Prediction using Frequent Item Set Mining
PDF
WebSite Visit Forecasting Using Data Mining Techniques
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
MULTIFACTOR NAÏVE BAYES CLASSIFICATION FOR THE SLOW LEARNER PREDICTION OVER M...
A Survey on: Utilizing of Different Features in Web Behavior Prediction
Webmining ppt
A Novel Framework on Web Usage Mining
Web Mining Research Issues and Future Directions – A Survey
IRJET- Enhancing Prediction of User Behavior on the Basic of Web Logs
Applications & Research Topics in Machine Learning
1. Web Mining – Web mining is an application of data mining for di.docx
Enactment of Firefly Algorithm and Fuzzy C-Means Clustering For Consumer Requ...
Internship Presentation.pdf
SEMINAR PRESENTATION FOR DATA SCIENCE
Internet ttraffic monitering anomalous behiviour detection
Data mining for_java_and_dot_net 2016-17
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
Abnormal Traffic Detection Based on Attention and Big Step Convolution.docx
User Navigation Pattern Prediction from Web Log Data: A Survey
IRJET- Customer Online Buying Prediction using Frequent Item Set Mining
Ad

More from Deepak Upadhyay (10)

PPTX
Power of positive attitude
PPTX
Holographic Data Storage
PPTX
Augmented Reality
PPTX
Basic of HTML, CSS(StyleSheet), JavaScript(js), Bootstrap, JSON & AngularJS
PPTX
Progressive web app
PPTX
Linux fundamentals
PDF
ADBMS (MySql) tiny project
PPTX
Man in The Middle Attack
PPTX
You Are Born To Blossom by Dr. APJ Abdul Kalam Book Review
PDF
Online notice board
Power of positive attitude
Holographic Data Storage
Augmented Reality
Basic of HTML, CSS(StyleSheet), JavaScript(js), Bootstrap, JSON & AngularJS
Progressive web app
Linux fundamentals
ADBMS (MySql) tiny project
Man in The Middle Attack
You Are Born To Blossom by Dr. APJ Abdul Kalam Book Review
Online notice board
Ad

Recently uploaded (20)

PDF
Data Science Trends & Career Guide---ppt
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Challenges and opportunities in feeding a growing population
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Foundation of Data Science unit number two notes
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PDF
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
PPT
Reliability_Chapter_ presentation 1221.5784
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Fluorescence-microscope_Botany_detailed content
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Data Science Trends & Career Guide---ppt
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Challenges and opportunities in feeding a growing population
oil_refinery_comprehensive_20250804084928 (1).pptx
Bharatiya Antariksh Hackathon 2025 Idea Submission PPT.pptx
IB Computer Science - Internal Assessment.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Moving the Public Sector (Government) to a Digital Adoption
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Foundation of Data Science unit number two notes
Major-Components-ofNKJNNKNKNKNKronment.pptx
Master Databricks SQL with AccentFuture – The Future of Data Warehousing
Reliability_Chapter_ presentation 1221.5784
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Fluorescence-microscope_Botany_detailed content
.pdf is not working space design for the following data for the following dat...
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
1_Introduction to advance data techniques.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu

EFFICIENT DATA EXTRACTION USING ARTIFICIAL INTELLIGENCE

  • 1. EFFICIENT DATA EXTRACTION USING ARTIFICIAL INTELLIGENCE Deepak D. Upadhyay
  • 2. Outlines ● Introduction to Web Mining For Extraction ● Purpose ● Method 1- Supervised Learning ● Method 2- Unsupervised Learning ● Comparison
  • 3. Introduction to Web Mining for Extraction ● Web mining describes the practice, of conservative; data mining techniques onto the web resources and has facilitated the further development of these techniques to consider the specific structures of web data. ● The analysed web resources contain the actual web site and the hyperlinks connecting these sites and the path that online users take on the web to reach a distinct site.
  • 4. Continue.. ● Web usage mining then refers to the deduction of useful knowledge from the data inputs. While the input data are mostly web server logs and other primarily technically position data, the expected output is an understanding of user behaviour in the domain of online data search, online shopping, online learning etc.
  • 5. Purpose ● web usage mining that helps to deal with certain web scaling problems such as user trend analysis of surfing, traffic flow analysis, distributed control handling, web traffic management and many more. ● Session tracking and website reorganization, distributed traffic sharing on distributed servers can be identified and analysis based on web data can be possible using concepts of neural network.
  • 6. Continue.. ● Neural network is far different from static networks in which each node is self-intelligent, hence the network becomes intelligent. So, web users can use this network more and more.
  • 7. Method 1- Supervised Learning ● In supervised learning the task is to automatically induce a model based on a set of N instances, called training data. ● This model then will be used to assign labels to new instances with unknown labels using only the value of their predictor variables. ● Artificial neuronal network is based on simulating the structure and behaviour of the biological neuronal networks.
  • 8. Two Approach for Web Mining in AI Approach 1- Neuro-Fuzzy Approach for Web Mining Approach 2- Reduction of Stages on Neuro-Fuzzy after Backpropagation implementation
  • 11. Continue.. ● If any Web-mining researches apply this Back propagations, then can easily obtained best result than any implemented Web mining techniques because of top down and bottom-up weights. ● Also using Back Propagation, it is more beneficiary to minimize the number of steps in Web mining as compare to neuro-fuzzy approach.
  • 12. Continue.. ● As neuro-fuzzy approach uses five major steps to produce the Webusage pattern forecast, and Web- usage data analyzer; named Web-log data collection, data preprocessing, self-organizing map, Web-usage data cluster, and fuzzy inference system ● But Back propagations use only three steps as Web- log data collection, data pre-processing, and Back propagations itself.
  • 13. Method 2- Unsupervised Learning ❖ Clustering using SOM The self-organizing maps (SOM) introduced ]are deemed as being highly effective as a sophisticated visualization tool for visualizing high dimensional, complex data with inherent relationships between the various features comprising the data. The SOM‟s output emphasizes the salient features of the data and subsequently leads to the automatic formation of clusters of similar data items.
  • 14. Continue.. ● The Self-Organizing Map (SOM) has proven to be one of the most powerful algorithms in data visualization and exploration. Application areas include various fields of science and technology, e.g., complex industrial processes, telecommunications systems, document and image databases, and even financial applications. ● The SOM maps the high- dimensional input vectors onto a two-dimensional grid of prototype vectors and orders them.
  • 15. Continue.. ● For a human interpreter, the ordered prototype vectors are easier to visualize and explore than the original data. The SOM has been widely implemented in various software tools, Post-processing the SOM extracts qualitative or quantitative information of the data.
  • 16. Fig - Applying SOM in Data Mining
  • 17. Table 1. Comparison with respect to SSE with Different clusters and cases of K-Means and SOM
  • 18. Continue.. ● K-Means cover more Urls but SOM works better for larger number of cases. With increase in data, learning process of SOM becomes more accurate and we can consider larger number of clusters. SOM is also efficient in time as compared to K-Means. Thus we can conclude that SOM has better performance than K-Mean
  • 19. Comparison ● supervised learning is much effective than unsupervised learning. Previously, unsupervised extraction used extraction patterns that make assumptions about the regularity of the structure in the data. We relax this assumption by exploiting reference sets to aid the extraction.
  • 20. Continue.. ● SOM used for clustering is much faster and accurate which helps us further in artificial neural network mining which going to analyse pattern defined in the training set and further will be compared many unorganised testing set. The comparison will go under the process of pre-processing, classification ,clustering and analysing.