Data Science with Python
Scientific Computing with Python (SciPy)
Learning Objectives
By the end of this lesson, you will be able to:
Explain the importance of SciPy
List the characteristics of SciPy
Explain sub-packages of SciPy
Discuss SciPy sub-packages, such as optimization,
integration, linear algebra, statistics, weave, and IO
SciPy and Its Characteristics
Multiple Scientific Domains
How to handle multiple scientific domains? The solution is SciPy.
Statistics
Space science Optimization
Image science Signal processing
Platform integration
Mathematical equations
Scientific Domains
SciPy
SciPy has built-in packages that help in handling the scientific domains.
Mathematics
integration Statistics
(Normal distribution)
Linear algebra
Multidimensional
image processing
Mathematics Language
constants integration
SciPy and Its Characteristics
Built-in mathematical libraries and 1 High-level commands for data
functions 2 manipulation and visualization
Simplifies scientific application
development 6
Efficient and fast data
3 processing
Large collection of sub-packages
for different scientific domains 5 Integrates well with multiple
4 systems and environments
SciPy Packages
Some widely used packages are:
Integration IO
Linear Algebra Optimize
Statistics Weave packages
5
4
Introduction of SciPy Sub-Package
SciPy Sub-Package
SciPy has multiple sub-packages which handle different scientific domains.
cluster ndimage
Clustering algorithms N-dimensional image processing
constants odr
Physical and mathematical constant Orthogonal distance regression
fftpack optimize
Fast Fourier Transform routines Optimization and root-finding routines
integrate signal
Integration and ordinary differential equation solvers Signal processing
Spatial sparse
Spatial data structures and algorithms Sparse matrices and associated routines
interpolate weave
Interpolation and smoothing splines C/C++ integration
IO stats
Input and Output Statistical distributions and functions
special
linalg
Special functions
Linear algebra
SciPy Sub-Package: Integration
SciPy provides integration techniques that solve mathematical sequences and series, or
perform function approximation.
General integration (quad) General multiple integration (dblquad, tplquad, nquad)
integrate.quad(f, a, b)
• integrate.dblquad()
• integrate.tplquad()
• integrate.nquad()
The limits of all inner integrals need to be defined as
functions.
SciPy Sub-Package: Integration
This example shows how to perform quad integration.
Import quad from
integrate sub-
package
Define function for
integration of x
Perform quad
integration for function
of x for limit 0 to 1
Define function for ax +
b
Declare value of a and
b
Perform quad
integration and pass
functions and
arguments
SciPy Sub-Package: Integration
This example shows you how to perform multiple integration.
Import integrate package
sub-package
Define function for x + y
Perform multiple
integration using the
lambda built-in function
SciPy Sub-Package: Optimization
SciPy Sub-Package: Optimization
Optimization is a process to improve performance of a system mathematically by fine-tuning the process
parameters.
SciPy provides several optimization algorithms, such as bfgs, Nelder-Mead simplex, Newton Conjugate
Gradient, COBYLA, or SLSQP.
Root finding, Curve fitting
Minimization functions
optimize.minimize(f, x0, method=‘BFGS’)
lower limit in a
given range
root(f, x0, method=’hybr’)
optimize.curve_fit(f, xdata, ydata)
SciPy Sub-Package: Optimization
Import numpy and
optimize from SciPy
Define function for
X^2 + 5 sin x
Perform optimize
minimize function
using bfgs method
and options
Perform optimize minimize
function using bfgs method and
without options
SciPy Sub-Package: Optimization
Define function for
X + 3.5 Cos x
Pass x value in argument for
root
Function value and array
values
SciPy Sub-Package: Linear Algebra
SciPy Sub-Package: Linear Algebra
SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.
Inverse of matrix Finding Determinant Solve Linear Single Value
systems Decomposition (SVD)
This function is used to compute the inverse of the given matrix. Let’s look at the inverse matrix
operation.
Import linalg and
Define a numpy
matrix or array
View the type
Use inv function to
inverse the matrix
SciPy Sub-Package: Linear Algebra
SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.
Inverse of matrix Finding Determinant Solve Linear Single Value
systems Decomposition (SVD)
With this function you can compute the value of the determinant for the given matrix.
Import linalg and
Define an numpy matrix or
array
Use det function to find the
determinant value of the
matrix
SciPy Sub-Package: Linear Algebra
SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.
Inverse of matrix Finding Determinant Solve Linear Single Value
systems Decomposition (SVD)
Linear equations Import linalg
2x + 3 y + z = 21
-x + 5y + 4z = 9
3x + 2y + 9z = 6
Use solve
method
SciPy Sub-Package: Linear Algebra
SciPy provides rapid linear algebra capabilities and contains advanced algebraic functions.
Inverse of matrix Finding Determinant Solve Linear Single Value
systems Decomposition (SVD)
Import linalg
Define matrix
Find shape of ndarray which
is 2X3 matrix
Use svd function
U (Unitary matrix)
Sigma or square root of eigenvalues
VH is values collected into
unitary matrix
Calculate Eigenvalues and Eigenvectors
Problem Statement: Demonstrate how to calculate eigenvalues and eigenvectors
Access: Click on the Practice Labs tab on the left side panel of the LMS. Copy or note the
username and password that is generated. Click on the Launch Lab button. On the page that
appears, enter the username and password in the respective fields, and click Login.
SciPy Sub-Package: Statistics
SciPy Sub-Package: Statistics
SciPy provides a very rich set of statistical functions which are:
• This package contains distributions for which random variables are
generated.
• These packages enable the addition of new routines and distributions. It
also offers convenience methods such as pdf(), cdf()
• Following are the statistical functions for a set of data:
o linear regression: linregress()
o describing data: describe(), normaltest()
SciPy Sub-Package: Statistics
CDF or Cumulative Distribution Function provides the cumulative probability associated with a function.
One standard
Cumulative deviation
Age Range Frequency
Frequency
0-10 19 19
10-20 55 74 68% of data
Total number of 95% of data
21-30 23 97 persons within
this age
31-40 36 133 99.7% of data
41-50 10 143
-3 -2 -1 01 1 2 3
51-60 17 160
F(x) = P(X≤x)
negative infinity
SciPy Sub-Package: Statistics
Probability Density Function, or PDF, of a continuous random variable is the derivative of its Cumulative Distribution
Function, or CDF.
Derivative of CDF
SciPy Sub-Package: Statistics
Shown here are functions used to perform Normal Distribution:
Import norm for normal
distribution
rvs for Random variables
cdf for Cumulative Distribution Function
pdf for Probability Density
Function for random
distribution
loc and scale are used to adjust the location and scale of the data distribution.
SciPy Sub-Package: Weave and IO
SciPy Sub-Package: Weave
The weave package provides ways to modify and extend any supported extension libraries.
Features of Weave Package:
• Includes C/C++ code within Python code
• Speed ups of 1.5x to 30x compared to algorithms written in pure Python
Two main functions of weave::
• inline() compiles and executes C/C++ code on the fly
• blitz() compiles NumPy Python expressions for fast execution
SciPy Sub-Package: IO
The IO package provides a set of functions to deal with several kinds of file formats.
It offers a set of functions to deal with file formats that include:
• MatLab file
• IDL files
• Matrix market files
• Wav sound files
• Arff files
• Netcdf files
Package provides additional files and its corresponding methods such as:
• Numpy.loadtxt()/Numpy.savetxt()
• Numpy.genfromtxt()/Numpy.recfromcsv()
• Numpy.save()/Numpy.load()
Using SciPy to Solve a Linear Algebra Problem
Problem Statement:
There is a test with 30 questions worth 150 marks. The test has two types of questions:
1. True or false – carries 4 marks each
2. Multiple choice – carries 9 marks each
Find the number of true or false and multiple-choice questions.
Common instructions:
•If you are new to Python, download the “Anaconda Installation Instructions” document
from the “Resources” tab to view the steps for installing Anaconda and the Jupyter
notebook.
•Download the “Assignment 01” notebook and upload it on the Jupyter notebook to
access it.
•Follow the cues provided to complete the assignment.
Using SciPy to Declare Random Values
Problem Statement:
Use SciPy to declare 20 random values for random values and perform the following:
1. CDF – Cumulative Distribution Function for 10 random variables.
2. PDF – Probability Density Function for 14 random variables.
Common instructions:
•If you are new to Python, download the “Anaconda Installation Instructions” document from the
“Resources” tab to view the steps for installing Anaconda and the Jupyter notebook.
•Download the “Assignment 02” notebook and upload it on the Jupyter notebook to access it.
•Follow the cues provided to complete the assignment.
Key Takeaways
You are now able to:
Explain the importance of SciPy
List the characteristics of SciPy
Explain sub-packages of SciPy
Discuss SciPy sub-packages, such as optimization,
integration, linear algebra, statistics, weave, and IO
Knowledge Check
Knowledge
Check What are the specification limits provided for curve fitting function (optimize.curve.fit),
during the optimization process?
1
a. Upper limit value
b. Lower limit value
c. Upper and lower limit values
d. Only the optimization method
Knowledge
Check What are the specification limits provided for curve fitting function (optimize.curve.fit),
during the optimization process?
1
a. Upper limit value
b. Lower limit value
c. Upper and lower limit values
d. Only the optimization method
The correct answer is c
Both the upper and lower limit values should be specified for optimize.curve.fit function.
Knowledge
Check
Which of the following function is used for inversing the matrix?
2
a. SciPy.special
b. SciPy.linalg
c. SciPy.signal
d. SciPy.stats
Knowledge
Check
Which of the following function is used for inversing the matrix?
2
a. SciPy.special
b. SciPy.linalg
c. SciPy.signal
d. SciPy.stats
The correct answer is b
SciPy.linalg is used to inverse the matrix.
Knowledge
Check
Which of the following is performed using SciPy?
3
a. Website
b. Plot data
c. Scientific calculations
d. System administration
Knowledge
Check
Which of the following is performed using SciPy?
3
a. Website
b. Plot data
c. Scientific calculations
d. System administration
The correct answer is c
SciPy has been specially made to perform scientific calculations. Generally, Python is the programming
language that has libraries to perform all listed activities.
Knowledge
Check
Which of the following functions is used to calculate minima?
4
a. optimize.minimize()
b. integrate.quad()
c. stats.linregress()
d. linalg.solve()
Knowledge
Check
Which of the following functions is used to calculate minima?
4
a. optimize.minimize()
b. integrate.quad()
c. stats.linregress()
d. linalg.solve()
The correct answer is a
The function optimize.minimize() is used to calculate minima. integrate.quad () is used for integral
calculation, stats.linregress() is used for linear regression, and linalg.solve() is used to solve a linear system.
Knowledge
Check
Which of the following syntaxes is used to generate 100 random variables from a
5 t-distribution with df = 10?
a. stats.t.pmf(df=10, size=100)
b. stats.t.pdf(df=10, size=100)
c. stats.t.rvs(df=10, size=100)
d. stats.t.rand(df=10, size=100)
Knowledge
Check
Which of the following syntaxes is used to generate 100 random variables from a
5 t-distribution with df = 10?
a. stats.t.pmf(df=10, size=100)
b. stats.t.pdf(df=10, size=100)
c. stats.t.rvs(df=10, size=100)
d. stats.t.rand(df=10, size=100)
The correct answer is c
The stats.t.rvs() function is used to generate random variables. stats.t.pmf() function is used to generate the
probability of mass function, and stats.t.pdf() is used to generate probability density function. Note that
stats.t.rand () does not exist.
Knowledge
Check
Which of the following functions is used to run C or C++ codes in SciPy?
6
a. io.loadmat()
b. weave.inline()
c. weave.blitz()
d. io.whosmat()
Knowledge
Check
Which of the following functions is used to run C or C++ codes in SciPy?
6
a. io.loadmat()
b. weave.inline()
c. weave.blitz()
d. io.whosmat()
The correct answer is b
inline() function accepts C codes as string and compiles them for later use. loadmat() loads variables from
.mat file. whosmat() checks the variables inside a .mat file.blitz(), and then compiles NumPy expressions for
faster running, but it can’t accept C codes.
Thank You