SlideShare a Scribd company logo
Python for Statistical Analysis
AND ITS DIFFERENT PACKAGES
18SE02CE011 : URJA DIYORA
SUBMIT TO :
DR.JASLEEN KAUR
OUTLINE
• Introduction to Pandas
• Data Wrangling with Pandas
• Plotting and Visualization
• NumPy Basics: Arrays and Vectorized Computation
• Statistical Data Modeling
• Data Loading, Storage, and File Formats
• Packages For Statistical Analysis
Introduction to Pandas
• Importing data
• Series and DataFrame objects
• Indexing, data selection and subsetting
• Hierarchical indexing
• Reading and writing files
• Sorting and ranking
• Missing data
• Data summarization
Data Wrangling with Pandas
• Date/time types
• Merging and joining DataFrame objects
• Concatenation
• Reshaping DataFrame objects
• Pivoting
• Data transformation
• Permutation and sampling
• Data aggregation and GroupBy operation
Plotting and Visualization
• Plotting in Pandas vs Matplotlib
• Bar plots
• Histograms
• Box plots
• Grouped plots
• Scatterplots
• Trellis plots
Statistical Data Modeling
• Statistical modeling
• Fitting data to probability distributions
• Fitting regression models
• Model selection
• Bootstrapping
Data Loading, Storage, and File Formats
• Indexing: Can treat one or more columns as the returned DataFrame,
and whether to get column names from the file, the user, or not at all.
• Type inference and data conversion: This includes the user-defined value
conversions and custom list of missing value markers.
• Datetime parsing: Includes combining capability, including combining
date and time information spread over multiple columns into a single
column in the result.
• Iterating: Support for iterating over chunks of very large files.
• Unclean data issues: Skipping rows or a footer, comments, or other
minor things like numeric data with thousands separated by commas
Packages For Statistical Analysis
• pandas >= 0.11.1 and its dependencies
• NumPy >= 1.6.1
• matplotlib >= 1.0.0
• pytz
• IPython >= 0.1.2
• pyzmq
• Tornado
• Optional: statsmodels, xlrd and openpyxl
NumPy Basics: Arrays and Vectorized Computation
• Fast vectorized array operations for data munging and cleaning,
subsetting and filtering, transformation, and any other kinds of
computations
• Common array algorithms like sorting, unique, and set operations
• Efficient descriptive statistics and aggregating/summarizing data
• Data alignment and relational data manipulations for merging and
joining together heterogeneous data sets
• Expressing conditional logic as array expressions instead of loops with if-
elifelse branches
• Group-wise data manipulations (aggregation, transformation, function
application).
Scipy
SciPy is a collection of packages addressing a number of different standard
problem domains in scientific computing.
• SciPy. Integrate: numerical integration routines and differential equation
solvers
• scipy.linalg: linear algebra routines and matrix decompositions extending
beyond those provided in numpy.linalg
• scipy.optimize: function optimizers (minimizers) and root finding
algorithms
• scipy.signal: signal processing tools
• scipy.sparse: sparse matrices and sparse linear system solvers
REFERENCES
http://oreilly.com/catalog/errata.csp?isbn=9781449319793f
corporate@oreilly.com
http://oreil.ly/python_for_data_analysis
http://facebook.com/oreilly
http://twitter.com/oreillymedia
http://www.youtube.com/oreillymedia

More Related Content

PPTX
PDF
Data Mining- Big Data landscape
PPT
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PPT
Michael Stonebraker How to do Complex Analytics
PPTX
Python data structures - best in class for data analysis
Data Mining- Big Data landscape
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.2 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Michael Stonebraker How to do Complex Analytics
Python data structures - best in class for data analysis

What's hot (20)

PPTX
Data Analytics with R and SQL Server
PDF
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
PPT
introduction to data mining tutorial
PPT
Data Mining Concepts and Techniques
PPTX
Major issues in data mining
PPTX
Data warehouse and olap technology
PDF
An R primer for SQL folks
PPT
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
PDF
Dbm630_lecture02-03
PDF
pandas: Powerful data analysis tools for Python
PPT
Data pre processing
PPT
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
PPT
data mining
PDF
pandas: a Foundational Python Library for Data Analysis and Statistics
PPTX
ECU SBL Learning Analytics for Assurance of Learning
PPTX
Data Mining: Key definitions
PPTX
Tatyana Matvienko,Senior Java Developer, Big data storages
PPTX
Big data storages
PPTX
Data Mining: Mining ,associations, and correlations
PPTX
Data mining techniques unit 2
Data Analytics with R and SQL Server
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
introduction to data mining tutorial
Data Mining Concepts and Techniques
Major issues in data mining
Data warehouse and olap technology
An R primer for SQL folks
Chapter -11 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Dbm630_lecture02-03
pandas: Powerful data analysis tools for Python
Data pre processing
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
data mining
pandas: a Foundational Python Library for Data Analysis and Statistics
ECU SBL Learning Analytics for Assurance of Learning
Data Mining: Key definitions
Tatyana Matvienko,Senior Java Developer, Big data storages
Big data storages
Data Mining: Mining ,associations, and correlations
Data mining techniques unit 2
Ad

Similar to Python for statistical analysis (20)

PDF
DS LAB MANUAL.pdf
DOCX
python fundamentals
PDF
2Essential-Python-Libraries-for-Data-Analytics[1].pdf
PDF
An Overview of Python for Data Analytics
PPTX
To understand the importance of Python libraries in data analysis.
PPTX
Data Analysis packages
PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
PPTX
Meetup Junio Data Analysis with python 2018
PPTX
Data Analysis in Python-NumPy
PDF
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
PPTX
PyData Ljubljana meetup #1
PPTX
DATA ANALYSIS AND VISUALISATION using python
PDF
Download full ebook of Mastering Pandas Femi Anthony instant download pdf
PDF
Panda data structures and its importance in Python.pdf
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
PPTX
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
PDF
Scientific Python
PPTX
Python for Data Analytics and ML examples
DS LAB MANUAL.pdf
python fundamentals
2Essential-Python-Libraries-for-Data-Analytics[1].pdf
An Overview of Python for Data Analytics
To understand the importance of Python libraries in data analysis.
Data Analysis packages
Data Science With Python | Python For Data Science | Python Data Science Cour...
Python for Data Analysis Data Wrangling with Pandas NumPy and IPython Wes Mck...
Meetup Junio Data Analysis with python 2018
Data Analysis in Python-NumPy
Python for Data Analysis_ Data Wrangling with Pandas, Numpy, and Ipython ( PD...
PyData Ljubljana meetup #1
DATA ANALYSIS AND VISUALISATION using python
Download full ebook of Mastering Pandas Femi Anthony instant download pdf
Panda data structures and its importance in Python.pdf
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python (3).pptx
Q-Step_WS_06112019_Data_Analysis_and_visualisation_with_Python.pptx
Scientific Python
Python for Data Analytics and ML examples
Ad

Recently uploaded (20)

PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
PDF
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
PPTX
Artificial Intelligence
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
web development for engineering and engineering
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
PPTX
Sustainable Sites - Green Building Construction
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Fundamentals of safety and accident prevention -final (1).pptx
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
bas. eng. economics group 4 presentation 1.pptx
BIO-INSPIRED HORMONAL MODULATION AND ADAPTIVE ORCHESTRATION IN S-AI-GPT
Unit I ESSENTIAL OF DIGITAL MARKETING.pdf
Current and future trends in Computer Vision.pptx
UNIT 4 Total Quality Management .pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Foundation to blockchain - A guide to Blockchain Tech
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Mitigating Risks through Effective Management for Enhancing Organizational Pe...
Artificial Intelligence
Model Code of Practice - Construction Work - 21102022 .pdf
web development for engineering and engineering
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Sustainable Sites - Green Building Construction
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Internet of Things (IOT) - A guide to understanding
Fundamentals of safety and accident prevention -final (1).pptx
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx

Python for statistical analysis

  • 1. Python for Statistical Analysis AND ITS DIFFERENT PACKAGES 18SE02CE011 : URJA DIYORA SUBMIT TO : DR.JASLEEN KAUR
  • 2. OUTLINE • Introduction to Pandas • Data Wrangling with Pandas • Plotting and Visualization • NumPy Basics: Arrays and Vectorized Computation • Statistical Data Modeling • Data Loading, Storage, and File Formats • Packages For Statistical Analysis
  • 3. Introduction to Pandas • Importing data • Series and DataFrame objects • Indexing, data selection and subsetting • Hierarchical indexing • Reading and writing files • Sorting and ranking • Missing data • Data summarization
  • 4. Data Wrangling with Pandas • Date/time types • Merging and joining DataFrame objects • Concatenation • Reshaping DataFrame objects • Pivoting • Data transformation • Permutation and sampling • Data aggregation and GroupBy operation
  • 5. Plotting and Visualization • Plotting in Pandas vs Matplotlib • Bar plots • Histograms • Box plots • Grouped plots • Scatterplots • Trellis plots
  • 6. Statistical Data Modeling • Statistical modeling • Fitting data to probability distributions • Fitting regression models • Model selection • Bootstrapping
  • 7. Data Loading, Storage, and File Formats • Indexing: Can treat one or more columns as the returned DataFrame, and whether to get column names from the file, the user, or not at all. • Type inference and data conversion: This includes the user-defined value conversions and custom list of missing value markers. • Datetime parsing: Includes combining capability, including combining date and time information spread over multiple columns into a single column in the result. • Iterating: Support for iterating over chunks of very large files. • Unclean data issues: Skipping rows or a footer, comments, or other minor things like numeric data with thousands separated by commas
  • 8. Packages For Statistical Analysis • pandas >= 0.11.1 and its dependencies • NumPy >= 1.6.1 • matplotlib >= 1.0.0 • pytz • IPython >= 0.1.2 • pyzmq • Tornado • Optional: statsmodels, xlrd and openpyxl
  • 9. NumPy Basics: Arrays and Vectorized Computation • Fast vectorized array operations for data munging and cleaning, subsetting and filtering, transformation, and any other kinds of computations • Common array algorithms like sorting, unique, and set operations • Efficient descriptive statistics and aggregating/summarizing data • Data alignment and relational data manipulations for merging and joining together heterogeneous data sets • Expressing conditional logic as array expressions instead of loops with if- elifelse branches • Group-wise data manipulations (aggregation, transformation, function application).
  • 10. Scipy SciPy is a collection of packages addressing a number of different standard problem domains in scientific computing. • SciPy. Integrate: numerical integration routines and differential equation solvers • scipy.linalg: linear algebra routines and matrix decompositions extending beyond those provided in numpy.linalg • scipy.optimize: function optimizers (minimizers) and root finding algorithms • scipy.signal: signal processing tools • scipy.sparse: sparse matrices and sparse linear system solvers