Introduction to Python
In this lecture
Data science
Tools for data science
History of Python
Python IDEs
Python for Data Science 2
Introduction
We live in a world that’s drowning in
data
Data is generated from various sources
◦ Websites track every user’s every click
◦ Your smartphone is building up a record of
your location
◦ Sensors from electronic devices record
real time information
◦ E-commerce websites collect purchasing
habits
Python for Data Science 3
Data science
Interdisciplinaryfield that brings
together computer science,
statistics and mathematics to
extract useful insights from data
Analyzingand generating insights
from data aids in arriving at
better business decisions
Python for Data Science 4
Popular tools used in data science
Data pre-processing and analysis
◦ Python, R, Microsoft Excel, SAS, SPSS
Data exploration and visualization
◦ Tableau, Qlikview, Microsoft Excel
Paralleland distributed computing
incase of big data
◦ Apache Spark,Apache Hadoop
Python for Data Science 5
Evolution of Python
Python was developed by
Guido van Rossum in the late
eighties at the ‘National
Research Institute for
Mathematics and Computer
Science’ at Netherlands
Python Editions
◦ Python 1.0-1991,
◦ Python 2.0- 2000
◦ Python 3.0 - 2008 (Python 3.7 –
latest)
Python for Data Science 6
Advantages of using python
Python has several features that
make it well suited for data
science
Open source and community
development
◦ Developed under Open Source
Initiative approved license making it
free to use and distribute even
commercially
Python for Data Science 7
Advantages of using python
Syntax used is simple to understand
and code
Libraries designed for specific data
science tasks
Combines well with majority of the
cloud platform service providers
Python for Data Science 8
Integrated development
environment (IDE)
Software application consisting of a
cohesive unit of tools required for
development
Designed to simplify software
development
Utilities provided by IDEs include
tools for managing, compiling,
deploying and debugging software
Python for Data Science 9
Features of IDE
IDEshould centralize three key
tools that form the crux of
software development
◦ Source code editor
◦ Compiler
◦ Debugger
Additional features
◦ Syntax and error highlighting
◦ Code completion
◦ Version control
Python for Data Science 10
Commonly used IDEs
Spyder
PyCharm
Jupyter Notebook
Atom
Python for Data Science 11
Spyder
Supported across Linux, Mac
OS X and Windows platforms
Available as open source
version
Bundled with Anaconda
distribution which comes with
all Python libraries
Developed for Python and
specifically data science
Python for Data Science 12
Spyder
Python for Data Science 13
Spyder
Features include
◦ Code editor with robust syntax
and error highlighting
◦ Code completion and navigation
◦ Debugger
◦ Integrated document
Interface
similar to MATLAB
and RStudio
Python for Data Science 14
PyCharm
Supported across Linux, Mac OS
X and Windows platforms
Available as community (free
open source) and professional
(paid) version
Supports only Python
Bundled with Anaconda
distribution which comes with
all Python libraries
◦ Can also be installed separately
Python for Data Science 15
PyCharm
Python for Data Science 16
PyCharm
Features include
◦ Code editor provides syntax
and error highlighting
◦ Code completion and
navigation
◦ Unit testing
◦ Debugger
◦ Version control
Python for Data Science 17
Jupyter Notebook
Web application that allows
creation and manipulation of
notebook documents called
‘notebook’
Supported across Linux, Mac OS
X and Windows platforms
Available as open source version
Python for Data Science 18
Jupyter Notebook
Source-https://jupyter.org/
Python for Data Science 19
Jupyter Notebook
Bundled with Anaconda
distribution or can be installed
separately
Supports Julia, Python, R and
Scala
Consists of ordered collection of
input and output cells that contain
code, text, plots etc.
Source-https://jupyter.org/
Python for Data Science 20
Jupyter Notebook
Allows sharing of code and
narrative text through output
formats like PDF, HTML etc.
◦ Education and presentation
tool
Lacksmost of the features of
a good IDE
Source-https://jupyter.org/
Python for Data Science 21
Atom
Open source text and source code
editor
Supported across Linux, Mac OS X
and Windows platforms
Supports Python, PHP, Java etc.
Well suited for developers
Enables users to install plug ins or
packages
◦ Packages can be installed for code
completion, debugging
Python for Data Science 22
Atom
Source-https://atom.io/
Python for Data Science 23
How to choose the best IDE?
Requirements
Working with different IDEs
helps us understand our own
requirement
In this course, Spyder will be
used
Python for Data Science 24
Summary
Popular tools used data science
Evolution of Python
Integrated development environment
◦ Spyder
◦ PyCharm
◦ Jupyter Notebook
◦ Atom
Python for Data Science 25
THANK YOU