SlideShare a Scribd company logo
Data
Engineering
and Analytics
using Python
PURNA CHANDER RAO. KATHULA
Talking Topics
 Jupyter notebook
 About me
 Python modules for Data Science
 Anaconda
 Pandas
 About pandas
 Data Munging / Data Preparation.
 Demo
 Seaborn
 About seaborn
 Machine Learning
 Linear Regression.
About me..
 Job Title = Architect QA
 Build Tools using Python for QA automation testing .
 Currently Learning
Python modules for Data Science
 Packages used for Data Analysis and Analytics
 Jupyter Notebook
 Pandas
 Numpy
 Scipy
 Matplotlib
 Seaborn
 Scikitlearn
Anaconda
Anaconda Distribution
What is Anaconda ?
 Essentially a Large ( ~ 400 MB ) Python Installation.
 But Contains Everything you need for Data Analysis
 Unless you have a special reason not to , you should just install and use this.
Pandas
About Pandas
 What is Pandas ?
Pandas is a Python library for data analysis and data manipulation. A python version of the R
data.frame library.
 Key Features of Pandas
 It has API’s for loading data from different file formats into memory.
 ( exel, tsv, csv, db and etc).
 Data is structured in the form of Rows and Columns.
 Retrieval of data is similar as SQL, can perform all the operations such as Groupby, Joins, Views and etc..
 Merging of data from multiple datasets.
 Does support much of DataTime series functionality, Timezone, Business Days, Holidays and etc..
 Boolean Indexing
 Fancy Indexing
Core DataStructures of Pandas
 DataFrames
 Series
Core Operations
Create Select Insert Map
Join Sort Clean ApplyMap
View Update Filter Append
Group Summarize Confirm Rotate
Create ( Creating a DataFrame)
View ( Viewing the rows and columns)
View ( Viewing the rows and columns)
Insert ( Adding a new column to dataframe)
Filter ( Slicing and dicing the datframe)
Map ( Map() and Apply map())
Append (Joining the dataframes based on x-axis=0 )
Concat (Joining the dataframes on Axis = 0 or 1)
Join ( Inner , Left, Right , Outer)
Join ( Inner )
Join ( Outer)
Join ( Left)
Join ( Right)
Group (groupby() )
Sort (by columns ascending True or False)
Clean ( Drop, Fillna, duplicates)
Clean ( Drop)
Clean ( Fillna ( method=‘ffill / bfill’)
Conform ( reindex() / resample, dropping / NAN as needed)
ReSample ()
ReSample (Monthly, Weekly, Yearly)
Rotate ( Transpose)
Rotate ( Pivot_table)
Rotate ( Stack)
Rotate ( unStack)
SeaBorn Analytics
What is Seaborn?
 Seaborn provides a high-level interface to matplotlib. It provides a high level
interface for drawing attractive statistical graphs.
Demo ( Restaurant Dataset visualization)
Machine Learning ( Linear Regression)
DEMO
Ad

Recommended

Python for Data Science
Python for Data Science
Harri Hämäläinen
 
Python pandas Library
Python pandas Library
Md. Sohag Miah
 
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
Introduction to Graph Databases
Introduction to Graph Databases
Max De Marzi
 
Data mining techniques unit 1
Data mining techniques unit 1
malathieswaran29
 
Exploratory data analysis with Python
Exploratory data analysis with Python
Davis David
 
Pandas
Pandas
maikroeder
 
Data Analytics Life Cycle
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
Data Visualization in Python
Data Visualization in Python
Jagriti Goswami
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Text Classification
Text Classification
RAX Automation Suite
 
Introduction to Data Engineering
Introduction to Data Engineering
Hadi Fadlallah
 
Data cubes
Data cubes
Mohammed
 
What is Big Data?
What is Big Data?
Bernard Marr
 
Database security
Database security
Software Engineering
 
Lecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Introduction to pandas
Introduction to pandas
Piyush rai
 
Big data
Big data
Mithilesh Joshi - SEO & Digital Marketing Consultant
 
Data warehouse architecture
Data warehouse architecture
pcherukumalla
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Python programming : Arrays
Python programming : Arrays
Emertxe Information Technologies Pvt Ltd
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
JOSEPH FRANCIS
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Apriori Algorithm
Apriori Algorithm
International School of Engineering
 
XML and DTD
XML and DTD
Jussi Pohjolainen
 
Classification in data mining
Classification in data mining
Sulman Ahmed
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
python data science libray seaborn.pptx
python data science libray seaborn.pptx
y18771929
 
python libray for data analytics seaborn[1].pptx
python libray for data analytics seaborn[1].pptx
y18771929
 

More Related Content

What's hot (20)

Data Visualization in Python
Data Visualization in Python
Jagriti Goswami
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Text Classification
Text Classification
RAX Automation Suite
 
Introduction to Data Engineering
Introduction to Data Engineering
Hadi Fadlallah
 
Data cubes
Data cubes
Mohammed
 
What is Big Data?
What is Big Data?
Bernard Marr
 
Database security
Database security
Software Engineering
 
Lecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Introduction to pandas
Introduction to pandas
Piyush rai
 
Big data
Big data
Mithilesh Joshi - SEO & Digital Marketing Consultant
 
Data warehouse architecture
Data warehouse architecture
pcherukumalla
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Python programming : Arrays
Python programming : Arrays
Emertxe Information Technologies Pvt Ltd
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
JOSEPH FRANCIS
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
Institute of Technology Telkom
 
Apriori Algorithm
Apriori Algorithm
International School of Engineering
 
XML and DTD
XML and DTD
Jussi Pohjolainen
 
Classification in data mining
Classification in data mining
Sulman Ahmed
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 
Data Visualization in Python
Data Visualization in Python
Jagriti Goswami
 
Exploratory data analysis in R - Data Science Club
Exploratory data analysis in R - Data Science Club
Martin Bago
 
Introduction to Data Engineering
Introduction to Data Engineering
Hadi Fadlallah
 
Lecture2 big data life cycle
Lecture2 big data life cycle
hktripathy
 
Introduction to pandas
Introduction to pandas
Piyush rai
 
Data warehouse architecture
Data warehouse architecture
pcherukumalla
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
Simplilearn
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
JOSEPH FRANCIS
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Classification in data mining
Classification in data mining
Sulman Ahmed
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 

Similar to Data engineering and analytics using python (20)

python data science libray seaborn.pptx
python data science libray seaborn.pptx
y18771929
 
python libray for data analytics seaborn[1].pptx
python libray for data analytics seaborn[1].pptx
y18771929
 
Data science
Data science
Purna Chander
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Dc python meetup
Dc python meetup
Jeffrey Clark
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
Python for Data Analytics and ML examples
Python for Data Analytics and ML examples
omaramssi06
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Advance Programming Slides lect.pptx.pdf
Advance Programming Slides lect.pptx.pdf
mohsinfareed780
 
Introduction_to_Seaborn presentation.pptx
Introduction_to_Seaborn presentation.pptx
nomikhanpc2004
 
Mastering pandas 1st Edition Femi Anthony
Mastering pandas 1st Edition Femi Anthony
paaolablan
 
pandas.pdf
pandas.pdf
AjeshSurejan2
 
pandas (1).pdf
pandas (1).pdf
AjeshSurejan2
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
Ken Mwai
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
hkabir55
 
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
Python_for_Data_Visualization.pptx python for BE &Mtech
Python_for_Data_Visualization.pptx python for BE &Mtech
PoojaPatil286778
 
Data-Analysis-and-Visualization-in-Python-1.pptx
Data-Analysis-and-Visualization-in-Python-1.pptx
ChiragNahata2
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptx
KashishKashish22
 
python data science libray seaborn.pptx
python data science libray seaborn.pptx
y18771929
 
python libray for data analytics seaborn[1].pptx
python libray for data analytics seaborn[1].pptx
y18771929
 
Python for Data Science: A Comprehensive Guide
Python for Data Science: A Comprehensive Guide
priyanka rajput
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
MOHITKUMAR1379
 
Python for Data Analytics and ML examples
Python for Data Analytics and ML examples
omaramssi06
 
python-pandas-For-Data-Analysis-Manipulate.pptx
python-pandas-For-Data-Analysis-Manipulate.pptx
PLOKESH8
 
Advance Programming Slides lect.pptx.pdf
Advance Programming Slides lect.pptx.pdf
mohsinfareed780
 
Introduction_to_Seaborn presentation.pptx
Introduction_to_Seaborn presentation.pptx
nomikhanpc2004
 
Mastering pandas 1st Edition Femi Anthony
Mastering pandas 1st Edition Femi Anthony
paaolablan
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Slides 111017220255-phpapp01
Slides 111017220255-phpapp01
Ken Mwai
 
Adarsh_Masekar(2GP19CS003).pptx
Adarsh_Masekar(2GP19CS003).pptx
hkabir55
 
Introduction to Data Analtics with Pandas [PyCon Cz]
Introduction to Data Analtics with Pandas [PyCon Cz]
Alexander Hendorf
 
Python_for_Data_Visualization.pptx python for BE &Mtech
Python_for_Data_Visualization.pptx python for BE &Mtech
PoojaPatil286778
 
Data-Analysis-and-Visualization-in-Python-1.pptx
Data-Analysis-and-Visualization-in-Python-1.pptx
ChiragNahata2
 
Abhishek Training PPT.pptx
Abhishek Training PPT.pptx
KashishKashish22
 
Ad

Recently uploaded (20)

Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
taqyed
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
reporting monthly for genset & Air Compressor.pptx
reporting monthly for genset & Air Compressor.pptx
dacripapanjaitan
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
pelaezmaryjoy90
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Introduction for GenAI for Faculty for University.pdf
Introduction for GenAI for Faculty for University.pdf
Saeed999312
 
Boost Business Efficiency with Professional Data Entry Services
Boost Business Efficiency with Professional Data Entry Services
eloiacs eloiacs
 
Model Evaluation & Visualisation part of a series of intro modules for data ...
Model Evaluation & Visualisation part of a series of intro modules for data ...
brandonlee626749
 
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
一比一原版(TUC毕业证书)开姆尼茨工业大学毕业证如何办理
taqyed
 
MRI Pulse Sequence in radiology physics.pptx
MRI Pulse Sequence in radiology physics.pptx
BelaynehBishaw
 
Crafting-Research-Recommendations Grade 12.pptx
Crafting-Research-Recommendations Grade 12.pptx
DaryllWhere
 
Shifting Focus on AI: How it Can Make a Positive Difference
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
UPS and Big Data intro to Business Analytics.pptx
UPS and Big Data intro to Business Analytics.pptx
sanjum5582
 
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
定制OCAD学生卡加拿大安大略艺术与设计大学成绩单范本,OCAD成绩单复刻
taqyed
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
reporting monthly for genset & Air Compressor.pptx
reporting monthly for genset & Air Compressor.pptx
dacripapanjaitan
 
llm_presentation and deep learning methods
llm_presentation and deep learning methods
sayedabdussalam11
 
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
Statistics-and-Computer-Tools-for-Analyzing-of-Assessment-Data.pptx
pelaezmaryjoy90
 
presentation4.pdf Intro to mcmc methodss
presentation4.pdf Intro to mcmc methodss
SergeyTsygankov6
 
NASA ESE Study Results v4 05.29.2020.pptx
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
Indigo_Airlines_Strategy_Presentation.pptx
Indigo_Airlines_Strategy_Presentation.pptx
mukeshpurohit991
 
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
YEAP !NOT WHAT YOU THINK aakshdjdncnkenfj
payalmistryb
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
英国毕业证范本利物浦约翰摩尔斯大学成绩单底纹防伪LJMU学生证办理学历认证
taqyed
 
Artigo - Playing to Win.planejamento docx
Artigo - Playing to Win.planejamento docx
KellyXavier15
 
Ad

Data engineering and analytics using python