Visvesvaraya Technological University
“jnana Sangam” Belagavi Karnataka
India
SECAB Institute of Engineering & Technology, Vijayapur
Department Of Master Of Computer Applications 2022-2023
A Seminar On
Web Scraping And Numerical Analysis
By
Course Co-Ordinator
Prof.Nazeera Madabhavi
Mohammad Azeem Maniyar 2SA22MC013
Web Scraping
 Web scraping in Python is a technique used to
extract data from websites. It's a valuable skill in
data analytics as it allows you to collect large
amounts of data from the web for analysis
 Beautiful Soup
 lxml
 Requests
 Scrapy
 Selenium
 html5lib
Python libraries are commonly used.
Here are some most Popular libraries
Parsing XML with lxml.objectify
<?xml version="1.0" encoding="UTF-8"?>
<root>
<room>
<n35237 type="number">1.0</n35237>
<n32238 type="number">3.0</n32238>
<n44699 type="number">nan</n44699>
</room>
<price>
<n35237 type="number">7020000.0</n35237>
<n32238 type="number">10000000.0</n32238>
<n44699 type="number">4128000.0</n44699>
</price>
<property_id>
<n35237 type="number">35237.0</n35237>
<n32238 type="number">32238.0</n32238>
<n44699 type="number">44699.0</n44699>
</property_id>
</root>
Program
from lxml import objectify
import pandas as pd
# Parse XML data
xml_data = objectify.parse('properties.xml')
root = xml_data.getroot() # Root element
# Extract data and column names
data = []
cols = []
for child in root.getchildren():
data.append([subchild.text for subchild in child.getchildren()])
cols.append(child.tag)
# Create DataFrame
df = pd.DataFrame(data).T # Create DataFrame and transpose it
# Set column names
df.columns = cols
# Print DataFrame
print(df)
Output
Python Seminar of Data analytics using python

More Related Content

PDF
Lead Data Scientist | Machine Learning & AI Expert | Predictive Maintenance &...
PPT
Struts(mrsurwar) ppt
DOCX
NTC 409 RANK Education Your Life / ntc409rank.com
PPTX
[Srijan Wednesday Webinars] The Fundamentals of ReactJS
PDF
Oracle Database Programming Using Java And Web Services 1st Edition Kuassi Me...
PDF
NTC 409 RANK Inspiring Innovation--ntc409rank.com
DOCX
NTC 409 RANK Lessons in Excellence-- ntc409rank.com
Lead Data Scientist | Machine Learning & AI Expert | Predictive Maintenance &...
Struts(mrsurwar) ppt
NTC 409 RANK Education Your Life / ntc409rank.com
[Srijan Wednesday Webinars] The Fundamentals of ReactJS
Oracle Database Programming Using Java And Web Services 1st Edition Kuassi Me...
NTC 409 RANK Inspiring Innovation--ntc409rank.com
NTC 409 RANK Lessons in Excellence-- ntc409rank.com

Similar to Python Seminar of Data analytics using python (20)

DOC
automatic database schema generation
DOCX
NTC 409 RANK Achievement Education--ntc409rank.com
DOCX
NTC 409 RANK Education Counseling -- ntc409rank.com
DOCX
NTC 409 RANK Redefined Education--ntc409rank.com
DOCX
Documentation
DOCX
5.local community detection algorithm based on minimal cluster
PPTX
Mvc training By Jaganath Rao Niku
PPTX
Mvc razor and working with data
DOC
NTC 409 Invent Yourself/newtonhelp.com
PDF
Art of Java Web Development.pdf
PPTX
Presentation On Industrial Training
PPTX
Event Management System using Full Stack Web Application Review-1
PDF
Online examination documentation
DOCX
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
PDF
A vague improved markov model approach for web page prediction
PPTX
Asp.Net Core MVC with Entity Framework
PPT
Presentation: Project Preliminary
DOC
Java non ieee project titles 2013-2014
DOC
automatic database schema generation
NTC 409 RANK Achievement Education--ntc409rank.com
NTC 409 RANK Education Counseling -- ntc409rank.com
NTC 409 RANK Redefined Education--ntc409rank.com
Documentation
5.local community detection algorithm based on minimal cluster
Mvc training By Jaganath Rao Niku
Mvc razor and working with data
NTC 409 Invent Yourself/newtonhelp.com
Art of Java Web Development.pdf
Presentation On Industrial Training
Event Management System using Full Stack Web Application Review-1
Online examination documentation
LOAD BALANCED CLUSTERING WITH MIMO UPLOADING TECHNIQUE FOR MOBILE DATA GATHER...
A vague improved markov model approach for web page prediction
Asp.Net Core MVC with Entity Framework
Presentation: Project Preliminary
Java non ieee project titles 2013-2014
Ad

Recently uploaded (20)

PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PDF
Hybrid model detection and classification of lung cancer
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Architecture types and enterprise applications.pdf
PDF
CloudStack 4.21: First Look Webinar slides
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPT
Geologic Time for studying geology for geologist
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlock new opportunities with location data.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
Taming the Chaos: How to Turn Unstructured Data into Decisions
1 - Historical Antecedents, Social Consideration.pdf
Final SEM Unit 1 for mit wpu at pune .pptx
DP Operators-handbook-extract for the Mautical Institute
O2C Customer Invoices to Receipt V15A.pptx
Hybrid model detection and classification of lung cancer
sustainability-14-14877-v2.pddhzftheheeeee
Architecture types and enterprise applications.pdf
CloudStack 4.21: First Look Webinar slides
A comparative study of natural language inference in Swahili using monolingua...
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Web Crawler for Trend Tracking Gen Z Insights.pptx
Hindi spoken digit analysis for native and non-native speakers
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Geologic Time for studying geology for geologist
Assigned Numbers - 2025 - Bluetooth® Document
Unlock new opportunities with location data.pdf
Chapter 5: Probability Theory and Statistics
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Getting started with AI Agents and Multi-Agent Systems
Ad

Python Seminar of Data analytics using python

  • 1. Visvesvaraya Technological University “jnana Sangam” Belagavi Karnataka India SECAB Institute of Engineering & Technology, Vijayapur Department Of Master Of Computer Applications 2022-2023 A Seminar On Web Scraping And Numerical Analysis By Course Co-Ordinator Prof.Nazeera Madabhavi Mohammad Azeem Maniyar 2SA22MC013
  • 2. Web Scraping  Web scraping in Python is a technique used to extract data from websites. It's a valuable skill in data analytics as it allows you to collect large amounts of data from the web for analysis
  • 3.  Beautiful Soup  lxml  Requests  Scrapy  Selenium  html5lib Python libraries are commonly used. Here are some most Popular libraries
  • 4. Parsing XML with lxml.objectify <?xml version="1.0" encoding="UTF-8"?> <root> <room> <n35237 type="number">1.0</n35237> <n32238 type="number">3.0</n32238> <n44699 type="number">nan</n44699> </room> <price> <n35237 type="number">7020000.0</n35237> <n32238 type="number">10000000.0</n32238> <n44699 type="number">4128000.0</n44699> </price> <property_id> <n35237 type="number">35237.0</n35237> <n32238 type="number">32238.0</n32238> <n44699 type="number">44699.0</n44699> </property_id> </root>
  • 5. Program from lxml import objectify import pandas as pd # Parse XML data xml_data = objectify.parse('properties.xml') root = xml_data.getroot() # Root element # Extract data and column names data = [] cols = [] for child in root.getchildren(): data.append([subchild.text for subchild in child.getchildren()]) cols.append(child.tag) # Create DataFrame df = pd.DataFrame(data).T # Create DataFrame and transpose it # Set column names df.columns = cols # Print DataFrame print(df)