Introduction to Business Analytics
Copyright © LEARNXT
Data Wrangling and Manipulation in Python
NumPy Package and Arrays
Copyright © LEARNXT
Objectives
After completing this session, you will be able to:
Demonstrate an understanding of the basics of NumPy
package
Explain fundamentals of NumPy Arrays with examples
Apply built-in functions and perform arithmetic operations
on NumPy arrays
Explain the process of saving and loading arrays with NumPy
Copyright © LEARNXT
NumPy
Copyright © LEARNXT
Scientific Python
Extra features required:
Fast, multidimensional arrays
Libraries of reliable, tested scientific functions
Plotting tools
NumPy is at the core of nearly every scientific Python application or module
It provides a fast N-d array datatype that can be manipulated in a vectorized form
Copyright © LEARNXT
NumPy Package
The fundamental library needed for scientific computing with Python is called NumPy
This Open-Source library contains:
A powerful N-dimensional array object
Advanced array slicing methods (to select array elements)
Convenient array reshaping methods
Copyright © LEARNXT
NumPy Package
NumPy even contains 3 libraries with numerical routines:
Basic linear algebra functions
Basic Fourier transforms
Sophisticated random number capabilities
Copyright © LEARNXT
Install NumPy
Ensure that the NumPy package is installed on your laptop/computer
You can use Anaconda command prompt terminal or Jupyter notebook to install the package:
conda install numpy
pip install numpy
Copyright © LEARNXT
Import NumPy
Import the NumPy package into Python session
import numpy as np
Copyright © LEARNXT
NumPy Arrays
Copyright © LEARNXT
NumPy Arrays
Lists are useful for storing small amounts of one-dimensional data
>>> a = [1,3,5,7,9] >>> a = [1,3,5,7,9]
>>> print(a[2:4]) >>> b = [3,5,6,7,9]
[5, 7] >>> c = a + b
>>> b = [[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]] >>> print c
>>> print(b[0]) [1, 3, 5, 7, 9, 3, 5, 6, 7, 9]
[1, 3, 5, 7, 9]
>>> print(b[1][2:4])
[6, 8]
But Lists can’t be used directly with arithmetical operators (+, -, *, /, …)
Need efficient arrays with arithmetic and better multidimensional tools
NumPy Arrays:
Like lists, but much more capable, except fixed size
Copyright © LEARNXT
Similarities Between Lists and Arrays
Both are used for storing data
Both are mutable
Both can be indexed and iterated through
Both can be sliced
Copyright © LEARNXT
Differences Between Lists and Arrays
Arrays are specially optimized for arithmetic computations so if you’re going to perform similar
operations you should consider using an array instead of a list
E.g. dividing each element in an array by number 2 is possible without a loop
Lists are containers for elements having differing data types, but arrays are used as containers
for elements of the same data type
NumPy arrays are faster and more compact than Python lists
An array consumes less memory and is convenient to use
NumPy uses much less memory to store data and it provides a mechanism of specifying the
data types. This allows the code to be optimized even further
Copyright © LEARNXT
Arrays from Data
Demographic data Extract Birth rate as Pandas Series
Birth Internet
Country Name rate users Income Group
Aruba 10.244 78.9 High income
Afghanistan 35.253 5.9 Low income
Angola 45.985 19.1 Upper middle income
Albania 12.877 57.2 Upper middle income
United Arab Emirates 11.044 88 High income
Extract birth rate as numpy array
Convert to data
frame
Convert data frame to numpy array
Copyright © LEARNXT 14
NumPy Array
NumPy arrays are the one of the most widely used data structuring techniques
An array is a central data structure of the NumPy library
An array is a grid of values and it contains information about the raw data, how to locate an
element, and how to interpret an element
It has a grid of elements that can be indexed in various ways
The elements are all of the same type, referred to as the array dtype
Copyright © LEARNXT
NumPy Array
NumPy arrays are of two types:
NumPy Arrays
Vectors Matrices A matrix refers to an array
A vector is an array
1-dimensional 2-dimensional with two dimensions
with a single
arrays arrays For 3-D or higher
dimension - there’s no
dimensional arrays, the
difference between
term tensor is also
row and column
commonly used
vectors
A matrix can still possess a
single row or a column
Copyright © LEARNXT
NumPy Array - Attributes
An array is usually a fixed-size container of items of the same type and size
The number of dimensions and items in an array is defined by its shape
The shape of an array is a tuple of non-negative integers that specify the sizes of each
dimension
Copyright © LEARNXT
NumPy – Creating Arrays
There are several ways to initialize new NumPy arrays, for example from
A Python list, list of lists, or tuples
Using functions that are dedicated to generating NumPy arrays, such as arrange(), linspace(),
etc.
Reading data from files
Copyright © LEARNXT
Creating NumPy Arrays – Examples
simple_list = [101,102,103,104,105,106,107,108,109,110]
simple_list
[101, 102, 103, 104, 105, 106, 107, 108, 109, 110]
# NumPy array from list
array1 = np.array(simple_list)
array1
array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110])
# Type of the array
type(array1)
numpy.ndarray
Copyright © LEARNXT
Creating NumPy Arrays – Examples
list_of_lists = [[10,11,12],[20,21,22],[30,31,32]]
list_of_lists
[[10, 11, 12], [20, 21, 22], [30, 31, 32]]
# create an array
array2 = np.array(list_of_lists)
array2
array([[10, 11, 12],
[20, 21, 22],
[30, 31, 32]])
Copyright © LEARNXT
Creating NumPy Arrays – Built-in Functions
arange(): Returns evenly spaced values within a given interval as input
# Array using built-in function arange()
np.arange(0,20)
# Returns values 0 to 19. Start value is 0 (included). Stop value is 20 (not included)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19])
# Array with arange() and including step argument
np.arange(0,21,4)
array([ 0, 4, 8, 12, 16, 20])
Copyright © LEARNXT
Generate Arrays of 0's
# Generate Array of 0's
array3 = np.zeros(50)
array3
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Copyright © LEARNXT
Generate Arrays of 1's
# Generate Array of 1's
array4 = np.ones((4,5))
array4
array([[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1.]])
Useful when we must create an empty array
Example: we initiate an empty array and progressively add results from a loop into the array
Copyright © LEARNXT
Arrays using linspace()
Equally specified values from the interval specified - create numeric sequences
# linspace() - create numeric sequence
array5 = np.linspace(0,20,10)
array5
array([ 0., 2.22222222, 4.44444444, 6.66666667, 8.88888889,11.11111111, 13.33333333,
15.55555556, 17.77777778, 20. ])
Copyright © LEARNXT
Arrays using eye()
# Create an Identity Matrix with eye()
array6 = np.eye(5)
Array6
array([[1., 0., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 0., 1., 0., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Copyright © LEARNXT
Random Numbered Arrays
Create random number arrays using rand(), randn(), randint()
Uniform distribution:
# Array - uniform distribution with rand()
# Every time you run this will generate the new set of numbers
array7 = np.random.rand(3,2)
array7
array([[0.48341811, 0.94935455],
[0.86604955, 0.29532457],
[0.79461142, 0.28140248]])
Copyright © LEARNXT
Random Numbered Arrays
Normal distribution:
# Array - Normal distribution with randn()
array8 = np.random.randn(3,2)
array8
array([[-0.05195311, 0.14081327],
[ 0.57633652, -0.42966707],
[ 1.03544668, -0.81755038]])
Copyright © LEARNXT
Random Numbered Arrays
Integers:
# Array - Integers with randint()
array9 = np.random.randint(5,20,10)
array9
array([15, 16, 14, 15, 12, 17, 14, 11, 18, 12])
Copyright © LEARNXT
Functions & Arithmetic Operations on Arrays
Copyright © LEARNXT
Functions on Arrays
Create an array and reshape into a 5 by 6 matrix
# Create an Array of 30 elements with arange()
sample_array = np.arange(30)
sample_array
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])
Copyright © LEARNXT
Functions on Arrays
# Reshape the array into a 5 x 6 matrix using reshape()
matrix2 = sample_array.reshape(5,6)
matrix2
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29]])
Copyright © LEARNXT
Functions on Arrays
Get the min and max values in an array
# Create an array of integers using randint()
array9 = np.random.randint(5,20,10)
array([ 9, 7, 11, 12, 9, 14, 18, 9, 6, 11])
# get the minimum number in the array
array9.min()
Copyright © LEARNXT
Functions on Arrays
Get the min and max values in an array
# Get the position of the minimum value in the array
array9.min()
# Get the dimension of the array
array9.shape
(10,)
Copyright © LEARNXT
Universal Array Functions
# Create an array and find the variance
sample_array = np.arange(30)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])
# Variance
np.var(sample_array)
74.91666666666667
Copyright © LEARNXT
Universal Array Functions
# Square root
Arr = np.sqrt(sample_array)
Arr
array([0. , 1. , 1.41421356, 1.73205081, 2. ,
2.23606798, 2.44948974, 2.64575131, 2.82842712, 3. ,
3.16227766, 3.31662479, 3.46410162, 3.60555128, 3.74165739,
3.87298335, 4. , 4.12310563, 4.24264069, 4.35889894,
4.47213595, 4.58257569, 4.69041576, 4.79583152, 4.89897949,
5. , 5.09901951, 5.19615242, 5.29150262, 5.38516481])
Copyright © LEARNXT
Universal
# log
Array Functions
np.log(sample_array)
array([ -inf, 0. , 0.69314718, 1.09861229, 1.38629436,
1.60943791, 1.79175947, 1.94591015, 2.07944154, 2.19722458,
2.30258509, 2.39789527, 2.48490665, 2.56494936, 2.63905733,
2.7080502 , 2.77258872, 2.83321334, 2.89037176, 2.94443898,
2.99573227, 3.04452244, 3.09104245, 3.13549422, 3.17805383,
3.21887582, 3.25809654, 3.29583687, 3.33220451, 3.36729583])
# Maximum value in the array
np.max(sample_array)
29
Copyright © LEARNXT
Universal Array Functions
Round the array values to 2 decimal places
# Round up the decimals
np.round(Arr, decimals = 2)
array([0. , 1. , 1.41, 1.73, 2. , 2.24, 2.45, 2.65, 2.83, 3. , 3.16,
3.32, 3.46, 3.61, 3.74, 3.87, 4. , 4.12, 4.24, 4.36, 4.47, 4.58,
4.69, 4.8 , 4.9 , 5. , 5.1 , 5.2 , 5.29, 5.39])
# Standard deviation
np.std(Arr)
1.3683899139885065
# Mean
np.mean(Arr)
3.553520654688042
Copyright © LEARNXT
Universal Array Functions - Strings
# Create an array of string values
sports = np.array(['golf', 'cric', 'fball', 'cric', 'Cric', 'fooseball’])
# Fetch unique values from the string-based array
np.unique(sports)
array(['Cric', 'cric', 'fball', 'fooseball', 'golf'], dtype='<U9’)
Copyright © LEARNXT
Arithmetic Operations
# View the sample array
sample_array
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])
# Addition of arrays
sample_array + sample_array
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32,34, 36, 38, 40, 42, 44, 46, 48,
50, 52, 54, 56, 58])
Copyright © LEARNXT
Arithmetic Operations
# Division of arrays
sample_array / sample_array
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1.,
1., 1., 1., 1.])
# Addition of a fixed value to the array
sample_array + 1
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29, 30])
Copyright © LEARNXT
Saving and Loading Arrays with NumPy
Copyright © LEARNXT
Saving Arrays with NumPy
Save function - saves in working directory as *.npy file
np.save(‘S2_sample_array', sample_array)
Create a new array called simple_array
simple_array = np.array(['golf', 'cric', 'fball', 'cric', 'Cric','fooseball’])
Save z function - saving multiple arrays in a zip archive
np.savez(‘S2_arrays.npz', a=sample_array, b=simple_array)
Copyright © LEARNXT
Loading Arrays with NumPy
# Load the saved file S2_sample_array
np.load('sample_array.npy’)
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19, 20, 21, 22, 23, 24, 25,
26, 27, 28, 29])
# Load the saved zip file with multiple arrays
archive = np.load('2_arrays.npz’)
# Load the second file from zip file
archive[‘b’]
array(['golf', 'cric', 'fball', 'cric', 'Cric', 'fooseball'], dtype='<U9')
Copyright © LEARNXT
Summary
NumPy provides a fast N-d array datatype that can be manipulated in a vectorized form
This Open-Source library contains: A powerful N-dimensional array object, Advanced array
slicing methods and convenient array reshaping methods
To import the NumPy package into Python session, import numpy as np
An array is a grid of values and it contains information about the raw data, how to locate an
element, and how to interpret an element
Two types of NumPy arrays: vectors (1-dimensional) and matrices (2-dimensional)
Built-in functions can be applied on NumPy arrays for faster processing
Copyright © LEARNXT
Additional Resources
McKinney, W. (2013). Python for data analysis. O'Reilly Media.
Lutz, M. (2013). Learning Python: Powerful object-oriented programming. O'Reilly Media.
Summerfield, M. (2010). Programming in Python 3: A complete introduction to the Python
language. Pearson Education India.
Matthes, E. (2019). Python crash course: A hands-on, project-based introduction to
programming (2nd ed.). No Starch Press.
Beazley, D., & Jones, B. K. (2013). Python cookbook: Recipes for mastering Python 3. O'Reilly
Media.
Copyright © LEARNXT
e-References
Welcome to Python.org. (n.d.). Python.org. https://www.python.org
Introduction to Python. (n.d.). W3Schools Online Web
Tutorials. https://www.w3schools.com/python/python_intro.asp
Copyright © LEARNXT 46
Any Questions?
Thank you
Copyright © LEARNXT
Copyright © LEARNXT