Numpy in python, Array operations using numpy and so on

SHERIN RAPPAI
Unit 3: Basics of Numpy
21BCA2T452 : Python Programming
Prof. Sherin Rappai
Assistant Professor Dept. of Computer Science

SHERIN RAPPAI
NUMPY BASICS:ARRAYS ANDVECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It
provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of
mathematical functions to perform vectorized computations efficiently.
Installing NumPy
Before using NumPy, you need to make sure it's installed.You can install it using pip:
pip install numpy

SHERIN RAPPAI
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Why Use Arrays?
Arrays are more efficient than lists when performing operations. For example, if you want to add 2 to
every element in the list, you would need a loop in plain Python. But with NumPy, you can do this in a
single line:
arr = np.array([1, 2, 3, 4, 5])
new_arr = arr + 2 # Adds 2 to every element in the array
print(new_arr)
Output: [3 4 5 6 7]

SHERIN RAPPAI
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]

SHERIN RAPPAI
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and
division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]

SHERIN RAPPAI
2. Indexing and Slicing:
Indexing means accessing a specific element in an array by its position (index). In NumPy,
indices start from 0.
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])

SHERIN RAPPAI
Slicing:Slicing allows you to access a range or subset of elements from an array. It is done
using the syntax arr[start:end], where start is the index where the slice begins (inclusive),
and end is where it stops (exclusive).
arr = np.array([10, 20, 30, 40, 50])
# Getting a slice of elements from index 1 to 3 (exclusive of 3)
print(arr[1:3]) # Output: [20 30]
# Getting a slice from the start till the third element
print(arr[:3]) # Output: [10 20 30]
# Getting a slice from index 2 to the end of the array
print(arr[2:]) # Output: [30 40 50]

SHERIN RAPPAI
Negative Indexing:
You can also use negative indices to access elements from the end of the array. For example, -1 refers to
the last element, -2 refers to the second last element, and so on.
Example:
arr = np.array([10, 20, 30, 40, 50])
# Accessing the last element
print(arr[-1]) # Output: 50
# Accessing the second last element
print(arr[-2]) # Output: 40

SHERIN RAPPAI
Slicing with Steps:You can also specify a step value, which tells how many elements to skip
in the slice.The syntax is arr[start:end:step].
Example:
arr = np.array([10, 20, 30, 40, 50, 60])
# Getting every second element from index 1 to 5
print(arr[1:5:2]) # Output: [20 40]
# Reversing the array using negative step
print(arr[::-1]) # Output: [60 50 40 30 20 10]
•The array is [10, 20, 30, 40, 50, 60].
•Index positions: [0, 1, 2, 3, 4, 5].
•The slice starts at index 1, which is 20.
•2 is the step value, which means "skip every
second element.
•It skips the next element and picks the
element at index 3, which is 40.
•The slice stops before reaching index 5.

SHERIN RAPPAI
3.Array Shape and Reshaping:
The shape of an array tells us how many elements it contains along each dimension (or axis).
You can check the shape of an array using the .shape attribute.
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
Reshaping:
Reshaping allows you to change the shape of an array without changing its data.You can
convert a 1D array to a 2D array, or a 2D array to a 3D array, etc., as long as the total
number of elements stays the same.
Example:

SHERIN RAPPAI
# Creating a 1D array with 6 elements
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshaping the 1D array into a 2D array (2 rows, 3 columns)
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
Reshape Rules:
When reshaping an array, the new shape must contain the same total number of elements as the original array. For
example, if you have an array with 12 elements, you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3
rows x 4 columns)A 4x3 array (4 rows x 3 columns)
Example
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Reshaping into 3 rows and 4 columns
reshaped_arr = arr.reshape(3, 4)
print(reshaped_arr)

SHERIN RAPPAI
Flattening an Array:If you want to convert a multi-dimensional array back into a 1D array, you can flatten it using
the .flatten() method.
Example
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flattening the 2D array into a 1D array
flat_arr = arr_2d.flatten()
print(flat_arr)
O/P
[1 2 3 4 5 6]
Shape:Tells you the dimensions of an array (rows, columns, etc.).
Reshaping: Lets you change the shape of an array while keeping the same number of elements.
Flattening: Converts a multi-dimensional array back into a 1D array.

SHERIN RAPPAI
4.Aggregation Functions:
Agregation functions are used to perform calculations on an entire array or along a specific axis (e.g., summing all
elements, finding the maximum, etc.).These functions are essential for data analysis and numerical computations.
Common Aggregation Functions:
Here are some of the most commonly used aggregation functions in NumPy:
1. Sum:The sum() function adds all the elements of an array.
2. Mean:The mean() function calculates the average of the elements.
3. Maximum and Minimum:max() gives the maximum value in the array. min() gives the minimum value in the array.
4. Product:The prod() function returns the product of all elements in the array (i.e., multiplies all elements together).
5. Standard Deviation andVariance: std() calculates the standard deviation (how spread out the numbers are).
6. var() calculates the variance (the square of the standard deviation).
7. Cumulative Sum and Product : cumsum() gives the cumulative sum (the sum of the elements up to each
index).cumprod() gives the cumulative product (the product of elements up to each index).
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value

SHERIN RAPPAI
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences of data
without the need for explicit loops.This approach leverages highly optimized, low-level code to achieve
faster and more efficient computations.The primary library for vectorized computation in Python is
NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays or lists.
For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]

SHERIN RAPPAI
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how you can
achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])

SHERIN RAPPAI
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the
DataFrame and the Series.These data structures are designed to handle structured data, making it easier to work
with datasets in a tabular format.
DataFrame:
 A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or
even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is a column.
 DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning,
exploration, and transformation.

SHERIN RAPPAI
import pandas as pd
# Creating a DataFrame from a dictionary of data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NewYork', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
Here's a basic example of how to create a DataFrame using Pandas:

SHERIN RAPPAI
Series:
 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the
data within a Series must be of the same data type. For example, if you create a Series with integer values, all values
within that Series will be integers.
 Labeled Data: Series have two parts: the data itself and an associated index.The index provides labels or names for
each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom
labels if needed.
 Size and Shape:A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns
or rows like a DataFrame.

SHERIN RAPPAI
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64

SHERIN RAPPAI
Some common tasks you can perform with Pandas:
 Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases,
and more.
 Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
 Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
 Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific criteria.
 Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to
create informative plots and charts.

SHERIN RAPPAI
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and
analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
DataFrame

SHERIN RAPPAI
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3.Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.

SHERIN RAPPAI
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row

SHERIN RAPPAI
6. Data Analysis:
Pandas provides various functions for data analysis, such as
describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)

SHERIN RAPPAI
INDEX OBJECTS-INDEXING, SELECTION,AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It
provides the labels or names for the rows or columns of your data.You can use indexing, selection, and
filtering techniques with these indexes to access specific data points or subsets of your data. Here's how
you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels.You can use .loc[] for label-based
indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element by
label and column name

SHERIN RAPPAI
EXAMPLE
import pandas as pd
# Create a DataFrame with custom labels
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data, index=['A1', 'B2', 'C3'])
# Access the row with label 'B2'
print(df.loc['B2’])
# Access the value in the row with label 'B2' and the column 'City'
print(df.loc['B2', 'City'])

SHERIN RAPPAI
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
Selection refers to choosing specific columns or rows from a DataFrame based on their labels or positions.You
use selection when you want to extract specific columns or rows without applying any condition.
It’s about choosing specific data (columns/rows) directly.
No conditional logic is applied
df['Column1'] # Select 'Column1' from the DataFrame
df[['Column1', 'Column2']] # Select 'Column1' and 'Column2'
df.loc[0] # Select the first row by index label
df.iloc[2] # Select the third row by integer position

SHERIN RAPPAI
3. Filtering:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
• df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
• df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to
select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)

SHERIN RAPPAI
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index()
method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.

SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of the objects
involved in the operation, which ensures that the result of the operation maintains data integrity and is aligned correctly.
Here are some key aspects of arithmetic and data alignment in Pandas:
1.Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two Series or
DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data based on common labels
and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C’])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match between
series1 and series2.
A NaN
B 6.0
C 8.0
D NaN
dtype: float64

SHERIN RAPPAI
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them.The alignment occurs both for rows
(based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns
with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values

SHERIN RAPPAI
5.Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the
shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with
datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way
that maintains the integrity and structure of your data.

SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy
is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how
arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays
being operated on.This means that if you perform an operation between two NumPy arrays of different shapes,
NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].

SHERIN RAPPAI
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ‘1’ on the left side.
For example:
•Shape (3, 5) and shape (5) become (3, 5) and (1, 5). NumPy adds a 1 on the left to make both arrays 2D.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Shape (3, 5) and (1, 5):The second dimensions (5 and 5) are the same, and the first dimensions (3 and 1) are compatible
because 1 can be stretched to 3.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default.This means that each element in the resulting array is the
result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2

SHERIN RAPPAI
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including
vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize()
function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or
elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-
wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.

SHERIN RAPPAI
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array.This
is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.

SHERIN RAPPAI
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied
element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.

SHERIN RAPPAI
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex mapping
operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.

SHERIN RAPPAI
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python
through libraries like NumPy and Pandas.These operations help organize data in a desired order or rank elements
based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort():This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)

SHERIN RAPPAI
np.argsort():This function returns the indices that would sort the array.You can use these indices to sort the original
array.
import numpy as np
output:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) Indices of sorted array: [1 3 6 0 9 2 4 5 8 7 10]
indices = np.argsort(arr)
print("Indices of sorted array:", sorted_indices)
sorted_arr = arr[indices]
print("Sorted array:", sorted_arr)
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method.You can specify the column(s) to sort by
and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)

SHERIN RAPPAI
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.You can
then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method.You can specify the sorting order and how to handle ties (e.g.,
assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:

SHERIN RAPPAI
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)

SHERIN RAPPAI
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)

SHERIN RAPPAI
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions,
respectively.These functions are useful for analyzing relationships and dependencies between variables. Here's how to
use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges
from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])

SHERIN RAPPAI
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.

SHERIN RAPPAI
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship
(both variables increase or decrease together), while negative values indicate an inverse relationship (one variable
increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can
compute the correlation matrix or covariance matrix for all pairs of variables.

SHERIN RAPPAI
HANDLING MISSING DATA
Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides several ways
to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here are some common
techniques for handling missing data in NumPy:
Using np.nan: NumPy represents missing data using np.nan.You can create arrays with missing values like this:
import numpy as np
arr = np.array([1.0, 2.0, np.nan, 4.0])
Now, arr contains a missing value represented as np.nan.

SHERIN RAPPAI
Checking for Missing Data:You can check for missing values using the np.isnan() function. For example:
np.isnan(arr) # Returns a boolean array indicating which elements are NaN.
Filtering Missing Data:To filter out missing values from an array, you can use boolean indexing. For example:
arr[~np.isnan(arr)] # Returns an array without NaN values.
Replacing Missing Data:You can replace missing values with a specific value using np.nan_to_num() or np.nanmean(). For
example:
arr[np.isnan(arr)] = 0 # Replace NaN with 0
Or, to replace NaN with the mean of the non-missing values:
mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean

SHERIN RAPPAI
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values.You can use
functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing the result.
Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing values.
NumPy provides functions like np.interp() for this purpose.
Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data more
explicitly by creating a mask that specifies which values are missing.This can be useful for certain computations.
import numpy as np
import numpy.ma as ma
arr = np.array([1, 2, np.nan, 4])
masked_arr = ma.masked_array(arr, np.isnan(arr)) # Mask NaN values
mean_val = masked_arr.mean() # Calculates mean ignoring NaNs
Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can apply
the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to handle missing
data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends on your data
analysis goals and the context of your data. Some methods may be more appropriate than others,
depending on your use case.

SHERIN RAPPAI
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-dimensional
arrays where each dimension has multiple levels or labels.This is particularly useful when you want to represent higher-
dimensional data with more complex hierarchical structures.
You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:
import numpy as np
import pandas as pd # Import pandas
# Create a MultiIndex with two levels
index = pd.MultiIndex.from_arrays([['A', 'A', 'B', 'B'], [1, 2, 1, 2], ['X', 'Y', 'X', 'Y']],
names=['Level1', 'Level2', 'Level3'])
# Create a random data array
data = np.random.rand(4, 3)
# Create a DataFrame with MultiIndex
df = pd.DataFrame(data, index=index, columns=['Value1', 'Value2', 'Value3'])
print(df)
Value1 Value2 Value3
Level1 Level2 Level3
A 1 X 0.654321 0.123456 0.987654
2 Y 0.234567 0.345678 0.456789
B 1 X 0.987654 0.876543 0.765432
2 Y 0.123456 0.234567 0.345678

SHERIN RAPPAI
In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second level.
Then, we've created a DataFrame with this MultiIndex and some random data.
You can access data from this DataFrame using hierarchical indexing. For example:
# Accessing data using hierarchical indexing
value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # AccessValue1 for 'A', 1, 'X'

SHERIN RAPPAI
Some common operations with hierarchical indexing include:
Slicing:You can perform slices at each level of the index, allowing you to select specific subsets of the data.
Stacking and Unstacking: Stacking: Converts columns into a new level of the index.
Unstacking: Moves one level of the index back into
columns.
Swapping Levels:You can swap levels to change the order of the levels in the index.
# Swap 'Letter' and 'Number' levels
print(df.swaplevel('Letter', 'Number'))
Grouping and Aggregating: You can group data based on levels of the index and perform aggregation functions like
mean, sum, etc.
Reordering Levels:You can change the order of levels in the index.
Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400

SHERIN RAPPAI
Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel
data or data with multiple categorical variables. It allows for more expressive data organization and
manipulation.You can also use the pd.MultiIndex class from the pandas library, which provides more
advanced functionality for working with hierarchical data structures, including various methods for
creating and manipulating MultiIndex objects.

Numpy in python, Array operations using numpy and so on

More Related Content

What's hot (20)

Similar to Numpy in python, Array operations using numpy and so on (20)

More from SherinRappai (20)

Recently uploaded (20)

Numpy in python, Array operations using numpy and so on

Editor's Notes