SlideShare a Scribd company logo
SHERIN RAPPAI
Unit 3: Basics of Numpy
21BCA2T452 : Python Programming
Prof. Sherin Rappai
Assistant Professor Dept. of Computer Science
SHERIN RAPPAI
NUMPY BASICS:ARRAYS ANDVECTORIZED COMPUTATION
NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It
provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of
mathematical functions to perform vectorized computations efficiently.
Installing NumPy
Before using NumPy, you need to make sure it's installed.You can install it using pip:
pip install numpy
SHERIN RAPPAI
Importing NumPy
To use NumPy in your Python code, you should import it:
import numpy as np
By convention, it's common to import NumPy as np for brevity.
Why Use Arrays?
Arrays are more efficient than lists when performing operations. For example, if you want to add 2 to
every element in the list, you would need a loop in plain Python. But with NumPy, you can do this in a
single line:
arr = np.array([1, 2, 3, 4, 5])
new_arr = arr + 2 # Adds 2 to every element in the array
print(new_arr)
Output: [3 4 5 6 7]
SHERIN RAPPAI
Creating NumPy Arrays
You can create NumPy arrays using various methods:
1. From Python Lists:
arr = np.array([1, 2, 3, 4, 5])
2. Using NumPy Functions:
zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements
ones_arr = np.ones(3) # Creates an array of ones with 3 elements
rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1
3. Using NumPy's Range Function:
range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
SHERIN RAPPAI
BASIC ARRAY OPERATIONS
Once you have NumPy arrays, you can perform various operations on them:
1. Element-wise Operations:
NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and
division:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
c = a + b # Element-wise addition: [5, 7, 9]
d = a * b # Element-wise multiplication: [4, 10, 18]
SHERIN RAPPAI
2. Indexing and Slicing:
Indexing means accessing a specific element in an array by its position (index). In NumPy,
indices start from 0.
arr = np.array([0, 1, 2, 3, 4, 5])
element = arr[2] # Access element at index 2 (value: 2)
sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
SHERIN RAPPAI
Slicing:Slicing allows you to access a range or subset of elements from an array. It is done
using the syntax arr[start:end], where start is the index where the slice begins (inclusive),
and end is where it stops (exclusive).
arr = np.array([10, 20, 30, 40, 50])
# Getting a slice of elements from index 1 to 3 (exclusive of 3)
print(arr[1:3]) # Output: [20 30]
# Getting a slice from the start till the third element
print(arr[:3]) # Output: [10 20 30]
# Getting a slice from index 2 to the end of the array
print(arr[2:]) # Output: [30 40 50]
SHERIN RAPPAI
Negative Indexing:
You can also use negative indices to access elements from the end of the array. For example, -1 refers to
the last element, -2 refers to the second last element, and so on.
Example:
arr = np.array([10, 20, 30, 40, 50])
# Accessing the last element
print(arr[-1]) # Output: 50
# Accessing the second last element
print(arr[-2]) # Output: 40
SHERIN RAPPAI
Slicing with Steps:You can also specify a step value, which tells how many elements to skip
in the slice.The syntax is arr[start:end:step].
Example:
arr = np.array([10, 20, 30, 40, 50, 60])
# Getting every second element from index 1 to 5
print(arr[1:5:2]) # Output: [20 40]
# Reversing the array using negative step
print(arr[::-1]) # Output: [60 50 40 30 20 10]
•The array is [10, 20, 30, 40, 50, 60].
•Index positions: [0, 1, 2, 3, 4, 5].
•The slice starts at index 1, which is 20.
•2 is the step value, which means "skip every
second element.
•It skips the next element and picks the
element at index 3, which is 40.
•The slice stops before reaching index 5.
SHERIN RAPPAI
3.Array Shape and Reshaping:
The shape of an array tells us how many elements it contains along each dimension (or axis).
You can check the shape of an array using the .shape attribute.
You can check and change the shape of NumPy arrays:
arr = np.array([[1, 2, 3], [4, 5, 6]])
shape = arr.shape # Get the shape (2, 3)
reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2)
Reshaping:
Reshaping allows you to change the shape of an array without changing its data.You can
convert a 1D array to a 2D array, or a 2D array to a 3D array, etc., as long as the total
number of elements stays the same.
Example:
SHERIN RAPPAI
# Creating a 1D array with 6 elements
arr = np.array([1, 2, 3, 4, 5, 6])
# Reshaping the 1D array into a 2D array (2 rows, 3 columns)
reshaped_arr = arr.reshape(2, 3)
print(reshaped_arr)
Reshape Rules:
When reshaping an array, the new shape must contain the same total number of elements as the original array. For
example, if you have an array with 12 elements, you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3
rows x 4 columns)A 4x3 array (4 rows x 3 columns)
Example
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
# Reshaping into 3 rows and 4 columns
reshaped_arr = arr.reshape(3, 4)
print(reshaped_arr)
SHERIN RAPPAI
Flattening an Array:If you want to convert a multi-dimensional array back into a 1D array, you can flatten it using
the .flatten() method.
Example
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
# Flattening the 2D array into a 1D array
flat_arr = arr_2d.flatten()
print(flat_arr)
O/P
[1 2 3 4 5 6]
Shape:Tells you the dimensions of an array (rows, columns, etc.).
Reshaping: Lets you change the shape of an array while keeping the same number of elements.
Flattening: Converts a multi-dimensional array back into a 1D array.
SHERIN RAPPAI
4.Aggregation Functions:
Agregation functions are used to perform calculations on an entire array or along a specific axis (e.g., summing all
elements, finding the maximum, etc.).These functions are essential for data analysis and numerical computations.
Common Aggregation Functions:
Here are some of the most commonly used aggregation functions in NumPy:
1. Sum:The sum() function adds all the elements of an array.
2. Mean:The mean() function calculates the average of the elements.
3. Maximum and Minimum:max() gives the maximum value in the array. min() gives the minimum value in the array.
4. Product:The prod() function returns the product of all elements in the array (i.e., multiplies all elements together).
5. Standard Deviation andVariance: std() calculates the standard deviation (how spread out the numbers are).
6. var() calculates the variance (the square of the standard deviation).
7. Cumulative Sum and Product : cumsum() gives the cumulative sum (the sum of the elements up to each
index).cumprod() gives the cumulative product (the product of elements up to each index).
NumPy provides functions to compute statistics on arrays:
arr = np.array([1, 2, 3, 4, 5])
mean = np.mean(arr) # Calculate the mean (average)
max_val = np.max(arr) # Find the maximum value
min_val = np.min(arr) # Find the minimum value
SHERIN RAPPAI
VECTORIZED COMPUTATION
Vectorized computation in Python refers to performing operations on entire arrays or sequences of data
without the need for explicit loops.This approach leverages highly optimized, low-level code to achieve
faster and more efficient computations.The primary library for vectorized computation in Python is
NumPy.
Traditional Loop-Based Computation
In traditional Python programming, you might use explicit loops to perform operations on arrays or lists.
For example:
# Using loops to add two lists element-wise
list1 = [1, 2, 3]
list2 = [4, 5, 6]
result = []
for i in range(len(list1)):
result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
SHERIN RAPPAI
Vectorized Computation with NumPy
NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how you can
achieve the same result using NumPy:
import numpy as np
# Using NumPy for element-wise addition
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
# Result: array([5, 7, 9])
SHERIN RAPPAI
INTRODUCTION TO PANDAS DATA STRUCTURES
Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the
DataFrame and the Series.These data structures are designed to handle structured data, making it easier to work
with datasets in a tabular format.
DataFrame:
 A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.
 It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or
even custom data types).
 You can think of a DataFrame as a collection of Series objects, where each Series is a column.
 DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning,
exploration, and transformation.
SHERIN RAPPAI
import pandas as pd
# Creating a DataFrame from a dictionary of data
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['NewYork', 'San Francisco', 'Los Angeles']}
df = pd.DataFrame(data)
# Displaying the DataFrame
print(df)
Here's a basic example of how to create a DataFrame using Pandas:
SHERIN RAPPAI
Series:
 A Series is a one-dimensional labeled array that can hold data of any data type.
 It is like a column in a DataFrame or a single variable in statistics.
 Series objects are commonly used for time series data, as well as other one-dimensional data.
Key characteristics of a Pandas Series:
 Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the
data within a Series must be of the same data type. For example, if you create a Series with integer values, all values
within that Series will be integers.
 Labeled Data: Series have two parts: the data itself and an associated index.The index provides labels or names for
each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom
labels if needed.
 Size and Shape:A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns
or rows like a DataFrame.
SHERIN RAPPAI
import pandas as pd
# Create a Series from a list
data = [10, 20, 30, 40, 50]
series = pd.Series(data)
# Display the Series
print(series)
0 10
1 20
2 30
3 40
4 50
dtype: int64
SHERIN RAPPAI
Some common tasks you can perform with Pandas:
 Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases,
and more.
 Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and
transforming data types.
 Data Selection: Easily select specific rows and columns of interest using various indexing techniques.
 Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific criteria.
 Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to
create informative plots and charts.
SHERIN RAPPAI
A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data
structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and
analysis in Python.
Here's how you can work with DataFrames in Python using Pandas:
1. Import Pandas:
First, you need to import the Pandas library.
import pandas as pd
2. Creating a DataFrame:
You can create a DataFrame in several ways. Here are a few
common methods:
From a dictionary:
data = {'Column1': [value1, value2, ...],
'Column2': [value1, value2, ...]}
df = pd.DataFrame(data)
DataFrame
SHERIN RAPPAI
• From a list of lists:
data = [[value1, value2],
[value3, value4]]
df = pd.DataFrame(data, columns=['Column1', 'Column2'])
• From a CSV file:
df = pd.read_csv('file.csv')
3.Viewing Data:
You can use various methods to view and explore your DataFrame:
df.head(): Displays the first few rows of the DataFrame.
df.tail(): Displays the last few rows of the DataFrame.
df.shape: Returns the number of rows and columns.
df.columns: Returns the column names.
df.info(): Provides information about the DataFrame, including data types and non-null counts.
SHERIN RAPPAI
4. Selecting Data:
You can select specific columns or rows from a DataFrame using indexing or filtering. For example:
df['Column1'] # Select a specific column
df[['Column1', 'Column2']] # Select multiple columns
df[df['Column1'] > 5] # Filter rows based on a condition
5. Modifying Data:
You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example:
df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column
df.at[index, 'Column1'] = new_value # Update a specific value
df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row
SHERIN RAPPAI
6. Data Analysis:
Pandas provides various functions for data analysis, such as
describe(), groupby(), agg(), and more.
7. Saving Data:
You can save the DataFrame to a CSV file or other formats:
df.to_csv('output.csv', index=False)
df.to_excel('output.xlsx', index=False)
SHERIN RAPPAI
INDEX OBJECTS-INDEXING, SELECTION,AND FILTERING
In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It
provides the labels or names for the rows or columns of your data.You can use indexing, selection, and
filtering techniques with these indexes to access specific data points or subsets of your data. Here's how
you can work with index objects in Pandas:
1. Indexing:
Indexing allows you to access specific elements or rows in your data using labels.You can use .loc[] for label-based
indexing and .iloc[] for integer-based indexing.
• Label-based indexing:
df.loc['label'] # Access a specific row by its label
df.loc['label', 'column_name'] # Access a specific element by
label and column name
SHERIN RAPPAI
EXAMPLE
import pandas as pd
# Create a DataFrame with custom labels
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']}
df = pd.DataFrame(data, index=['A1', 'B2', 'C3'])
# Access the row with label 'B2'
print(df.loc['B2’])
# Access the value in the row with label 'B2' and the column 'City'
print(df.loc['B2', 'City'])
SHERIN RAPPAI
• Integer-based indexing:
df.iloc[0] # Access the first row
df.iloc[0, 1] # Access an element by row and column index
2. Selection:
Selection refers to choosing specific columns or rows from a DataFrame based on their labels or positions.You
use selection when you want to extract specific columns or rows without applying any condition.
It’s about choosing specific data (columns/rows) directly.
No conditional logic is applied
df['Column1'] # Select 'Column1' from the DataFrame
df[['Column1', 'Column2']] # Select 'Column1' and 'Column2'
df.loc[0] # Select the first row by index label
df.iloc[2] # Select the third row by integer position
SHERIN RAPPAI
3. Filtering:
You can use various methods to select specific data based on conditions or criteria.
• Select rows based on a condition:
• df[df['Column'] > 5] # Select rows where 'Column' is greater than 5
• Select rows by multiple conditions:
• df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10
Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to
select rows meeting the condition.
Create a boolean mask:
condition = df['Column'] > 5
Apply the mask to the DataFrame:
filtered_df = df[condition]
4. Setting a New Index:
You can set a specific column as the index of your DataFrame using the .set_index() method.
df.set_index('Column_Name', inplace=True)
SHERIN RAPPAI
5. Resetting the Index:
If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index()
method.
df.reset_index(inplace=True)
6. Multi-level Indexing:
You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data
structures.
df.set_index(['Index1', 'Index2'], inplace=True)
Index objects in Pandas are versatile and powerful for working with data because they enable you to
access and manipulate your data in various ways, whether it's for data retrieval, filtering, or
restructuring.
SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN PANDAS
Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and
DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of the objects
involved in the operation, which ensures that the result of the operation maintains data integrity and is aligned correctly.
Here are some key aspects of arithmetic and data alignment in Pandas:
1.Automatic Alignment:
When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two Series or
DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data based on common labels
and performs the operation only on matching labels.
series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C’])
series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D'])
result = series1 + series2
In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match between
series1 and series2.
A NaN
B 6.0
C 8.0
D NaN
dtype: float64
SHERIN RAPPAI
2. Missing Data (NaN):
When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values.
3. DataFrame Alignment:
The same principles apply to DataFrames when performing operations between them.The alignment occurs both for rows
(based on the index) and columns (based on column names).
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y'])
df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z'])
result = df1 + df2
In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2.
4. Handling Missing Data:
You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns
with missing data.
result_filled = result.fillna(0) # Replace NaN with 0
result_dropped = result.dropna() # Remove rows or columns with NaN values
SHERIN RAPPAI
5.Alignment with Broadcasting:
Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the
shape of the Series.
series = pd.Series([1, 2, 3])
scalar = 2
result = series * scalar
In this example, result will be a Series with values [2, 4, 6].
Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with
datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way
that maintains the integrity and structure of your data.
SHERIN RAPPAI
ARITHMETIC AND DATA ALIGNMENT IN NUMPY
NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy
is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how
arithmetic and data alignment work in NumPy:
Automatic Alignment:
NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays
being operated on.This means that if you perform an operation between two NumPy arrays of different shapes,
NumPy will broadcast the smaller array to match the shape of the larger one, element-wise.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4])
result = arr1 + arr2
In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
SHERIN RAPPAI
Broadcasting Rules:
NumPy follows specific rules when broadcasting arrays:
If the arrays have a different number of dimensions, pad the smaller shape with ‘1’ on the left side.
For example:
•Shape (3, 5) and shape (5) become (3, 5) and (1, 5). NumPy adds a 1 on the left to make both arrays 2D.
Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are
compatible.
If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error.
Shape (3, 5) and (1, 5):The second dimensions (5 and 5) are the same, and the first dimensions (3 and 1) are compatible
because 1 can be stretched to 3.
Handling Missing Data:
In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with
mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible.
Element-Wise Operations:
NumPy performs arithmetic operations element-wise by default.This means that each element in the resulting array is the
result of applying the operation to the corresponding elements in the input arrays.
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 * arr2
SHERIN RAPPAI
APPLYING FUNCTIONS AND MAPPING
In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including
vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize()
function for mapping operations. Here's an overview of these approaches:
Vectorized Functions:
NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or
elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element-
wise to arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Applying a function element-wise
result = np.square(arr) # Square each element
In this example, the np.square() function is applied element-wise to the arr array.
SHERIN RAPPAI
‘np.apply_along_axis():
You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array.This
is useful when you want to apply a function to each row or column of a 2D array.
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]])
# Apply a function along the rows (axis=1)
def sum_of_row(row):
return np.sum(row)
result = np.apply_along_axis(sum_of_row, axis=1, arr=arr)
In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
SHERIN RAPPAI
np.vectorize():
The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied
element-wise to NumPy arrays.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Apply the vectorized function to the array
result = vectorized_func(arr)
This approach is useful when you have a custom function that you want to apply to an array.
SHERIN RAPPAI
Mapping with np.vectorize():
You can use np.vectorize() to map a function to each element of an array.
import numpy as np
arr = np.array([1, 2, 3, 4])
# Define a Python function
def my_function(x):
return x * 2
# Create a vectorized version of the function
vectorized_func = np.vectorize(my_function)
# Map the function to each element
result = vectorized_func(arr)
This approach is similar to applying a function element-wise but can be used for more complex mapping
operations.
These methods allow you to apply functions and perform mapping operations efficiently on NumPy
arrays, making it a powerful library for numerical and scientific computing tasks.
SHERIN RAPPAI
SORTING AND RANKING
Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python
through libraries like NumPy and Pandas.These operations help organize data in a desired order or rank elements
based on specific criteria. Here's how to perform sorting and ranking in both libraries:
Sorting in NumPy:
In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions.
np.sort():This function returns a new sorted array without modifying the original array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
sorted_arr = np.sort(arr)
SHERIN RAPPAI
np.argsort():This function returns the indices that would sort the array.You can use these indices to sort the original
array.
import numpy as np
output:
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) Indices of sorted array: [1 3 6 0 9 2 4 5 8 7 10]
indices = np.argsort(arr)
print("Indices of sorted array:", sorted_indices)
sorted_arr = arr[indices]
print("Sorted array:", sorted_arr)
Sorting in Pandas:
In Pandas, you can sort Series and DataFrames using the sort_values() method.You can specify the column(s) to sort by
and the sorting order.
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35]}
df = pd.DataFrame(data)
# Sort by 'Age' column in ascending order
sorted_df = df.sort_values(by='Age', ascending=True)
SHERIN RAPPAI
NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.You can
then use these rankings to create a ranked array.
import numpy as np
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
indices = np.argsort(arr)
ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0
Ranking in Pandas:
In Pandas, you can rank data using the rank() method.You can specify the sorting order and how to handle ties (e.g.,
assigning the average rank to tied values).
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 30]}
df = pd.DataFrame(data)
# Rank by 'Age' column in descending order and assign average rank to tied values
df['Rank'] = df['Age'].rank(ascending=False, method='average')
Ranking in NumPy:
SHERIN RAPPAI
SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS
1. Summary Statistics:
NumPy provides functions to compute summary statistics directly on arrays.
import numpy as np
data = np.array([25, 30, 22, 35, 28])
mean = np.mean(data)
median = np.median(data)
std_dev = np.std(data)
variance = np.var(data)
SHERIN RAPPAI
2. Percentiles and Quartiles:
You can compute specific percentiles and quartiles using the np.percentile() function.
percentile_25 = np.percentile(data, 25)
percentile_75 = np.percentile(data, 75)
3. Correlation and Covariance:
You can compute correlation and covariance between arrays using np.corrcoef() and np.cov().
correlation_matrix = np.corrcoef(data1, data2)
covariance_matrix = np.cov(data1, data2)
SHERIN RAPPAI
CORRELATION AND COVARIANCE
In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions,
respectively.These functions are useful for analyzing relationships and dependencies between variables. Here's how to
use them:
Computing Correlation Coefficient (Correlation):
The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges
from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation.
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
SHERIN RAPPAI
# Compute the correlation coefficient between x and y
correlation_matrix = np.corrcoef(x, y)
# The correlation coefficient is in the (0, 1) element of the matrix
correlation_coefficient = correlation_matrix[0, 1]
In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
SHERIN RAPPAI
Computing Covariance:
Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship
(both variables increase or decrease together), while negative values indicate an inverse relationship (one variable
increases as the other decreases).
import numpy as np
# Create two arrays representing variables
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 3, 4, 5, 6])
# Compute the covariance between x and y
covariance_matrix = np.cov(x, y)
# The covariance is in the (0, 1) element of the matrix
covariance = covariance_matrix[0, 1]
In this example, covariance will contain the covariance between x and y.
Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and
covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can
compute the correlation matrix or covariance matrix for all pairs of variables.
SHERIN RAPPAI
HANDLING MISSING DATA
Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides several ways
to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here are some common
techniques for handling missing data in NumPy:
Using np.nan: NumPy represents missing data using np.nan.You can create arrays with missing values like this:
import numpy as np
arr = np.array([1.0, 2.0, np.nan, 4.0])
Now, arr contains a missing value represented as np.nan.
SHERIN RAPPAI
Checking for Missing Data:You can check for missing values using the np.isnan() function. For example:
np.isnan(arr) # Returns a boolean array indicating which elements are NaN.
Filtering Missing Data:To filter out missing values from an array, you can use boolean indexing. For example:
arr[~np.isnan(arr)] # Returns an array without NaN values.
Replacing Missing Data:You can replace missing values with a specific value using np.nan_to_num() or np.nanmean(). For
example:
arr[np.isnan(arr)] = 0 # Replace NaN with 0
Or, to replace NaN with the mean of the non-missing values:
mean = np.nanmean(arr)
arr[np.isnan(arr)] = mean
SHERIN RAPPAI
Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values.You can use
functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing the result.
Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing values.
NumPy provides functions like np.interp() for this purpose.
Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data more
explicitly by creating a mask that specifies which values are missing.This can be useful for certain computations.
import numpy as np
import numpy.ma as ma
arr = np.array([1, 2, np.nan, 4])
masked_arr = ma.masked_array(arr, np.isnan(arr)) # Mask NaN values
mean_val = masked_arr.mean() # Calculates mean ignoring NaNs
Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can apply
the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to handle missing
data along specific dimensions.
Keep in mind that the specific method you choose to handle missing data depends on your data
analysis goals and the context of your data. Some methods may be more appropriate than others,
depending on your use case.
SHERIN RAPPAI
HIERARCHICAL INDEXING
Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-dimensional
arrays where each dimension has multiple levels or labels.This is particularly useful when you want to represent higher-
dimensional data with more complex hierarchical structures.
You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example:
import numpy as np
import pandas as pd # Import pandas
# Create a MultiIndex with two levels
index = pd.MultiIndex.from_arrays([['A', 'A', 'B', 'B'], [1, 2, 1, 2], ['X', 'Y', 'X', 'Y']],
names=['Level1', 'Level2', 'Level3'])
# Create a random data array
data = np.random.rand(4, 3)
# Create a DataFrame with MultiIndex
df = pd.DataFrame(data, index=index, columns=['Value1', 'Value2', 'Value3'])
print(df)
Value1 Value2 Value3
Level1 Level2 Level3
A 1 X 0.654321 0.123456 0.987654
2 Y 0.234567 0.345678 0.456789
B 1 X 0.987654 0.876543 0.765432
2 Y 0.123456 0.234567 0.345678
SHERIN RAPPAI
In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second level.
Then, we've created a DataFrame with this MultiIndex and some random data.
You can access data from this DataFrame using hierarchical indexing. For example:
# Accessing data using hierarchical indexing
value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # AccessValue1 for 'A', 1, 'X'
SHERIN RAPPAI
Some common operations with hierarchical indexing include:
Slicing:You can perform slices at each level of the index, allowing you to select specific subsets of the data.
Stacking and Unstacking: Stacking: Converts columns into a new level of the index.
Unstacking: Moves one level of the index back into
columns.
Swapping Levels:You can swap levels to change the order of the levels in the index.
# Swap 'Letter' and 'Number' levels
print(df.swaplevel('Letter', 'Number'))
Grouping and Aggregating: You can group data based on levels of the index and perform aggregation functions like
mean, sum, etc.
Reordering Levels:You can change the order of levels in the index.
Resetting Index: You can reset the index to move the hierarchical index levels back to columns.
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400
Value1 Value2
Number Letter
1 A 10 100
2 A 20 200
1 B 30 300
2 B 40 400
SHERIN RAPPAI
Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel
data or data with multiple categorical variables. It allows for more expressive data organization and
manipulation.You can also use the pd.MultiIndex class from the pandas library, which provides more
advanced functionality for working with hierarchical data structures, including various methods for
creating and manipulating MultiIndex objects.

More Related Content

Similar to Numpy in python, Array operations using numpy and so on (20)

Unit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptxUnit 3_Numpy_VP.pptx
Unit 3_Numpy_VP.pptx
vishnupriyapm4
 
Essential numpy before you start your Machine Learning journey in python.pdf
Essential numpy before you start your Machine Learning journey in python.pdfEssential numpy before you start your Machine Learning journey in python.pdf
Essential numpy before you start your Machine Learning journey in python.pdf
Smrati Kumar Katiyar
 
Chapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programmingChapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
PPS-UNIT5.ppt
PPS-UNIT5.pptPPS-UNIT5.ppt
PPS-UNIT5.ppt
Sivasankar Chandrasekaran
 
NumPy.pptx
NumPy.pptxNumPy.pptx
NumPy.pptx
DrJasmineBeulahG
 
NUMPY LIBRARY study materials PPT 2.pptx
NUMPY LIBRARY study materials PPT 2.pptxNUMPY LIBRARY study materials PPT 2.pptx
NUMPY LIBRARY study materials PPT 2.pptx
CHETHANKUMAR274045
 
Homework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdfHomework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdf
aroraopticals15
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
UmarMustafa13
 
object oriented programing in python and pip
object oriented programing in python and pipobject oriented programing in python and pip
object oriented programing in python and pip
LakshmiMarineni
 
NUMPY
NUMPY NUMPY
NUMPY
Global Academy of Technology
 
CAP776Numpy.ppt
CAP776Numpy.pptCAP776Numpy.ppt
CAP776Numpy.ppt
kdr52121
 
CAP776Numpy (2).ppt
CAP776Numpy (2).pptCAP776Numpy (2).ppt
CAP776Numpy (2).ppt
ChhaviCoachingCenter
 
Array 31.8.2020 updated
Array 31.8.2020 updatedArray 31.8.2020 updated
Array 31.8.2020 updated
vrgokila
 
arraycreation.pptx
arraycreation.pptxarraycreation.pptx
arraycreation.pptx
sathya930629
 
Numpy
NumpyNumpy
Numpy
Jyoti shukla
 
Array-single dimensional array concept .pptx
Array-single dimensional array concept .pptxArray-single dimensional array concept .pptx
Array-single dimensional array concept .pptx
SindhuVelmukull
 
CS3401- Algorithmto use for data structure.docx
CS3401- Algorithmto use for data structure.docxCS3401- Algorithmto use for data structure.docx
CS3401- Algorithmto use for data structure.docx
ywar08112
 
Numpy_Pandas_for beginners_________.pptx
Numpy_Pandas_for beginners_________.pptxNumpy_Pandas_for beginners_________.pptx
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
python-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptxpython-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptx
smartashammari
 
Data Preprocessing Introduction for Machine Learning
Data Preprocessing Introduction for Machine LearningData Preprocessing Introduction for Machine Learning
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 
Essential numpy before you start your Machine Learning journey in python.pdf
Essential numpy before you start your Machine Learning journey in python.pdfEssential numpy before you start your Machine Learning journey in python.pdf
Essential numpy before you start your Machine Learning journey in python.pdf
Smrati Kumar Katiyar
 
Chapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programmingChapter 5-Numpy-Pandas.pptx python programming
Chapter 5-Numpy-Pandas.pptx python programming
ssuser77162c
 
NUMPY LIBRARY study materials PPT 2.pptx
NUMPY LIBRARY study materials PPT 2.pptxNUMPY LIBRARY study materials PPT 2.pptx
NUMPY LIBRARY study materials PPT 2.pptx
CHETHANKUMAR274045
 
Homework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdfHomework Assignment – Array Technical DocumentWrite a technical .pdf
Homework Assignment – Array Technical DocumentWrite a technical .pdf
aroraopticals15
 
CE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdfCE344L-200365-Lab2.pdf
CE344L-200365-Lab2.pdf
UmarMustafa13
 
object oriented programing in python and pip
object oriented programing in python and pipobject oriented programing in python and pip
object oriented programing in python and pip
LakshmiMarineni
 
CAP776Numpy.ppt
CAP776Numpy.pptCAP776Numpy.ppt
CAP776Numpy.ppt
kdr52121
 
Array 31.8.2020 updated
Array 31.8.2020 updatedArray 31.8.2020 updated
Array 31.8.2020 updated
vrgokila
 
arraycreation.pptx
arraycreation.pptxarraycreation.pptx
arraycreation.pptx
sathya930629
 
Array-single dimensional array concept .pptx
Array-single dimensional array concept .pptxArray-single dimensional array concept .pptx
Array-single dimensional array concept .pptx
SindhuVelmukull
 
CS3401- Algorithmto use for data structure.docx
CS3401- Algorithmto use for data structure.docxCS3401- Algorithmto use for data structure.docx
CS3401- Algorithmto use for data structure.docx
ywar08112
 
Numpy_Pandas_for beginners_________.pptx
Numpy_Pandas_for beginners_________.pptxNumpy_Pandas_for beginners_________.pptx
Numpy_Pandas_for beginners_________.pptx
Abhi Marvel
 
python-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptxpython-numwpyandpandas-170922144956.pptx
python-numwpyandpandas-170922144956.pptx
smartashammari
 
Data Preprocessing Introduction for Machine Learning
Data Preprocessing Introduction for Machine LearningData Preprocessing Introduction for Machine Learning
Data Preprocessing Introduction for Machine Learning
sonali sonavane
 

More from SherinRappai (20)

Shells commands, file structure, directory structure.pptx
Shells commands, file structure, directory structure.pptxShells commands, file structure, directory structure.pptx
Shells commands, file structure, directory structure.pptx
SherinRappai
 
Shell Programming Language in Operating System .pptx
Shell Programming Language in Operating System .pptxShell Programming Language in Operating System .pptx
Shell Programming Language in Operating System .pptx
SherinRappai
 
Types of NoSql Database available.pptx
Types of NoSql   Database available.pptxTypes of NoSql   Database available.pptx
Types of NoSql Database available.pptx
SherinRappai
 
Introduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptxIntroduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptx
SherinRappai
 
Clustering, Types of clustering, Types of data
Clustering, Types of clustering, Types of dataClustering, Types of clustering, Types of data
Clustering, Types of clustering, Types of data
SherinRappai
 
Association rule introduction, Market basket Analysis
Association rule introduction, Market basket AnalysisAssociation rule introduction, Market basket Analysis
Association rule introduction, Market basket Analysis
SherinRappai
 
Introduction to Internet, Domain Name System
Introduction to Internet, Domain Name SystemIntroduction to Internet, Domain Name System
Introduction to Internet, Domain Name System
SherinRappai
 
Cascading style sheet, CSS Box model, Table in CSS
Cascading style sheet, CSS Box model, Table in CSSCascading style sheet, CSS Box model, Table in CSS
Cascading style sheet, CSS Box model, Table in CSS
SherinRappai
 
Functions in python, types of functions in python
Functions in python, types of functions in pythonFunctions in python, types of functions in python
Functions in python, types of functions in python
SherinRappai
 
Extensible markup language ppt as part of Internet Technology
Extensible markup language ppt as part of Internet TechnologyExtensible markup language ppt as part of Internet Technology
Extensible markup language ppt as part of Internet Technology
SherinRappai
 
Java script ppt from students in internet technology
Java script ppt from students in internet technologyJava script ppt from students in internet technology
Java script ppt from students in internet technology
SherinRappai
 
Building Competency and Career in the Open Source World
Building Competency and Career in the Open Source WorldBuilding Competency and Career in the Open Source World
Building Competency and Career in the Open Source World
SherinRappai
 
How to Build a Career in Open Source.pptx
How to Build a Career in Open Source.pptxHow to Build a Career in Open Source.pptx
How to Build a Career in Open Source.pptx
SherinRappai
 
Issues in Knowledge representation for students
Issues in Knowledge representation for studentsIssues in Knowledge representation for students
Issues in Knowledge representation for students
SherinRappai
 
A* algorithm of Artificial Intelligence for BCA students
A* algorithm of Artificial Intelligence for BCA studentsA* algorithm of Artificial Intelligence for BCA students
A* algorithm of Artificial Intelligence for BCA students
SherinRappai
 
Unit 2.pptx
Unit 2.pptxUnit 2.pptx
Unit 2.pptx
SherinRappai
 
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptxCOMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
SherinRappai
 
Clustering.pptx
Clustering.pptxClustering.pptx
Clustering.pptx
SherinRappai
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
SherinRappai
 
Rendering Algorithms.pptx
Rendering Algorithms.pptxRendering Algorithms.pptx
Rendering Algorithms.pptx
SherinRappai
 
Shells commands, file structure, directory structure.pptx
Shells commands, file structure, directory structure.pptxShells commands, file structure, directory structure.pptx
Shells commands, file structure, directory structure.pptx
SherinRappai
 
Shell Programming Language in Operating System .pptx
Shell Programming Language in Operating System .pptxShell Programming Language in Operating System .pptx
Shell Programming Language in Operating System .pptx
SherinRappai
 
Types of NoSql Database available.pptx
Types of NoSql   Database available.pptxTypes of NoSql   Database available.pptx
Types of NoSql Database available.pptx
SherinRappai
 
Introduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptxIntroduction to NoSQL & Features of NoSQL.pptx
Introduction to NoSQL & Features of NoSQL.pptx
SherinRappai
 
Clustering, Types of clustering, Types of data
Clustering, Types of clustering, Types of dataClustering, Types of clustering, Types of data
Clustering, Types of clustering, Types of data
SherinRappai
 
Association rule introduction, Market basket Analysis
Association rule introduction, Market basket AnalysisAssociation rule introduction, Market basket Analysis
Association rule introduction, Market basket Analysis
SherinRappai
 
Introduction to Internet, Domain Name System
Introduction to Internet, Domain Name SystemIntroduction to Internet, Domain Name System
Introduction to Internet, Domain Name System
SherinRappai
 
Cascading style sheet, CSS Box model, Table in CSS
Cascading style sheet, CSS Box model, Table in CSSCascading style sheet, CSS Box model, Table in CSS
Cascading style sheet, CSS Box model, Table in CSS
SherinRappai
 
Functions in python, types of functions in python
Functions in python, types of functions in pythonFunctions in python, types of functions in python
Functions in python, types of functions in python
SherinRappai
 
Extensible markup language ppt as part of Internet Technology
Extensible markup language ppt as part of Internet TechnologyExtensible markup language ppt as part of Internet Technology
Extensible markup language ppt as part of Internet Technology
SherinRappai
 
Java script ppt from students in internet technology
Java script ppt from students in internet technologyJava script ppt from students in internet technology
Java script ppt from students in internet technology
SherinRappai
 
Building Competency and Career in the Open Source World
Building Competency and Career in the Open Source WorldBuilding Competency and Career in the Open Source World
Building Competency and Career in the Open Source World
SherinRappai
 
How to Build a Career in Open Source.pptx
How to Build a Career in Open Source.pptxHow to Build a Career in Open Source.pptx
How to Build a Career in Open Source.pptx
SherinRappai
 
Issues in Knowledge representation for students
Issues in Knowledge representation for studentsIssues in Knowledge representation for students
Issues in Knowledge representation for students
SherinRappai
 
A* algorithm of Artificial Intelligence for BCA students
A* algorithm of Artificial Intelligence for BCA studentsA* algorithm of Artificial Intelligence for BCA students
A* algorithm of Artificial Intelligence for BCA students
SherinRappai
 
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptxCOMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
COMPUTING AND PROGRAMMING FUNDAMENTAL.pptx
SherinRappai
 
neuralnetwork.pptx
neuralnetwork.pptxneuralnetwork.pptx
neuralnetwork.pptx
SherinRappai
 
Rendering Algorithms.pptx
Rendering Algorithms.pptxRendering Algorithms.pptx
Rendering Algorithms.pptx
SherinRappai
 
Ad

Recently uploaded (20)

Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdfCrypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too LateKubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementaryMurdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureNo-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
Crypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdfCrypto Super 500 - 14th Report - June2025.pdf
Crypto Super 500 - 14th Report - June2025.pdf
Stephen Perrenod
 
Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025Mastering AI Workflows with FME - Peak of Data & AI 2025
Mastering AI Workflows with FME - Peak of Data & AI 2025
Safe Software
 
Your startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean accountYour startup on AWS - How to architect and maintain a Lean and Mean account
Your startup on AWS - How to architect and maintain a Lean and Mean account
angelo60207
 
Kubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too LateKubernetes Security Act Now Before It’s Too Late
Kubernetes Security Act Now Before It’s Too Late
Michael Furman
 
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy SurveyTrustArc Webinar - 2025 Global Privacy Survey
TrustArc Webinar - 2025 Global Privacy Survey
TrustArc
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Murdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementaryMurdledescargadarkweb.pdfvolumen1 100 elementary
Murdledescargadarkweb.pdfvolumen1 100 elementary
JorgeSemperteguiMont
 
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdfHow Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
How Advanced Environmental Detection Is Revolutionizing Oil & Gas Safety.pdf
Rejig Digital
 
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025Developing Schemas with FME and Excel - Peak of Data & AI 2025
Developing Schemas with FME and Excel - Peak of Data & AI 2025
Safe Software
 
If You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FMEIf You Use Databricks, You Definitely Need FME
If You Use Databricks, You Definitely Need FME
Safe Software
 
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
National Fuels Treatments Initiative: Building a Seamless Map of Hazardous Fu...
Safe Software
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025TimeSeries Machine Learning - PyData London 2025
TimeSeries Machine Learning - PyData London 2025
Suyash Joshi
 
Providing an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME FlowProviding an OGC API Processes REST Interface for FME Flow
Providing an OGC API Processes REST Interface for FME Flow
Safe Software
 
Ben Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding WorldBen Blair - Operating Safely in a Vibe Coding World
Ben Blair - Operating Safely in a Vibe Coding World
AWS Chicago
 
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free DownloadViral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Viral>Wondershare Filmora 14.5.18.12900 Crack Free Download
Puppy jhon
 
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven InfrastructureNo-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
No-Code Workflows for CAD & 3D Data: Scaling AI-Driven Infrastructure
Safe Software
 
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
Can We Use Rust to Develop Extensions for PostgreSQL? (POSETTE: An Event for ...
NTT DATA Technology & Innovation
 
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Scaling GenAI Inference From Prototype to Production: Real-World Lessons in S...
Anish Kumar
 
Ad

Numpy in python, Array operations using numpy and so on

  • 1. SHERIN RAPPAI Unit 3: Basics of Numpy 21BCA2T452 : Python Programming Prof. Sherin Rappai Assistant Professor Dept. of Computer Science
  • 2. SHERIN RAPPAI NUMPY BASICS:ARRAYS ANDVECTORIZED COMPUTATION NumPy (Numerical Python) is a fundamental library in Python for numerical and scientific computing. It provides support for arrays (multi-dimensional, homogeneous data structures) and a wide range of mathematical functions to perform vectorized computations efficiently. Installing NumPy Before using NumPy, you need to make sure it's installed.You can install it using pip: pip install numpy
  • 3. SHERIN RAPPAI Importing NumPy To use NumPy in your Python code, you should import it: import numpy as np By convention, it's common to import NumPy as np for brevity. Why Use Arrays? Arrays are more efficient than lists when performing operations. For example, if you want to add 2 to every element in the list, you would need a loop in plain Python. But with NumPy, you can do this in a single line: arr = np.array([1, 2, 3, 4, 5]) new_arr = arr + 2 # Adds 2 to every element in the array print(new_arr) Output: [3 4 5 6 7]
  • 4. SHERIN RAPPAI Creating NumPy Arrays You can create NumPy arrays using various methods: 1. From Python Lists: arr = np.array([1, 2, 3, 4, 5]) 2. Using NumPy Functions: zeros_arr = np.zeros(5) # Creates an array of zeros with 5 elements ones_arr = np.ones(3) # Creates an array of ones with 3 elements rand_arr = np.random.rand(3, 3) # Creates a 3x3 array with random values between 0 and 1 3. Using NumPy's Range Function: range_arr = np.arange(0, 10, 2) # Creates an array with values [0, 2, 4, 6, 8]
  • 5. SHERIN RAPPAI BASIC ARRAY OPERATIONS Once you have NumPy arrays, you can perform various operations on them: 1. Element-wise Operations: NumPy allows you to perform element-wise operations, like addition, subtraction, multiplication, and division: a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) c = a + b # Element-wise addition: [5, 7, 9] d = a * b # Element-wise multiplication: [4, 10, 18]
  • 6. SHERIN RAPPAI 2. Indexing and Slicing: Indexing means accessing a specific element in an array by its position (index). In NumPy, indices start from 0. arr = np.array([0, 1, 2, 3, 4, 5]) element = arr[2] # Access element at index 2 (value: 2) sub_array = arr[2:5] # Slice from index 2 to 4 (values: [2, 3, 4])
  • 7. SHERIN RAPPAI Slicing:Slicing allows you to access a range or subset of elements from an array. It is done using the syntax arr[start:end], where start is the index where the slice begins (inclusive), and end is where it stops (exclusive). arr = np.array([10, 20, 30, 40, 50]) # Getting a slice of elements from index 1 to 3 (exclusive of 3) print(arr[1:3]) # Output: [20 30] # Getting a slice from the start till the third element print(arr[:3]) # Output: [10 20 30] # Getting a slice from index 2 to the end of the array print(arr[2:]) # Output: [30 40 50]
  • 8. SHERIN RAPPAI Negative Indexing: You can also use negative indices to access elements from the end of the array. For example, -1 refers to the last element, -2 refers to the second last element, and so on. Example: arr = np.array([10, 20, 30, 40, 50]) # Accessing the last element print(arr[-1]) # Output: 50 # Accessing the second last element print(arr[-2]) # Output: 40
  • 9. SHERIN RAPPAI Slicing with Steps:You can also specify a step value, which tells how many elements to skip in the slice.The syntax is arr[start:end:step]. Example: arr = np.array([10, 20, 30, 40, 50, 60]) # Getting every second element from index 1 to 5 print(arr[1:5:2]) # Output: [20 40] # Reversing the array using negative step print(arr[::-1]) # Output: [60 50 40 30 20 10] •The array is [10, 20, 30, 40, 50, 60]. •Index positions: [0, 1, 2, 3, 4, 5]. •The slice starts at index 1, which is 20. •2 is the step value, which means "skip every second element. •It skips the next element and picks the element at index 3, which is 40. •The slice stops before reaching index 5.
  • 10. SHERIN RAPPAI 3.Array Shape and Reshaping: The shape of an array tells us how many elements it contains along each dimension (or axis). You can check the shape of an array using the .shape attribute. You can check and change the shape of NumPy arrays: arr = np.array([[1, 2, 3], [4, 5, 6]]) shape = arr.shape # Get the shape (2, 3) reshaped = arr.reshape(3, 2) # Reshape the array to (3, 2) Reshaping: Reshaping allows you to change the shape of an array without changing its data.You can convert a 1D array to a 2D array, or a 2D array to a 3D array, etc., as long as the total number of elements stays the same. Example:
  • 11. SHERIN RAPPAI # Creating a 1D array with 6 elements arr = np.array([1, 2, 3, 4, 5, 6]) # Reshaping the 1D array into a 2D array (2 rows, 3 columns) reshaped_arr = arr.reshape(2, 3) print(reshaped_arr) Reshape Rules: When reshaping an array, the new shape must contain the same total number of elements as the original array. For example, if you have an array with 12 elements, you could reshape it to:A 2x6 array (2 rows x 6 columns)A 3x4 array (3 rows x 4 columns)A 4x3 array (4 rows x 3 columns) Example arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]) # Reshaping into 3 rows and 4 columns reshaped_arr = arr.reshape(3, 4) print(reshaped_arr)
  • 12. SHERIN RAPPAI Flattening an Array:If you want to convert a multi-dimensional array back into a 1D array, you can flatten it using the .flatten() method. Example arr_2d = np.array([[1, 2, 3], [4, 5, 6]]) # Flattening the 2D array into a 1D array flat_arr = arr_2d.flatten() print(flat_arr) O/P [1 2 3 4 5 6] Shape:Tells you the dimensions of an array (rows, columns, etc.). Reshaping: Lets you change the shape of an array while keeping the same number of elements. Flattening: Converts a multi-dimensional array back into a 1D array.
  • 13. SHERIN RAPPAI 4.Aggregation Functions: Agregation functions are used to perform calculations on an entire array or along a specific axis (e.g., summing all elements, finding the maximum, etc.).These functions are essential for data analysis and numerical computations. Common Aggregation Functions: Here are some of the most commonly used aggregation functions in NumPy: 1. Sum:The sum() function adds all the elements of an array. 2. Mean:The mean() function calculates the average of the elements. 3. Maximum and Minimum:max() gives the maximum value in the array. min() gives the minimum value in the array. 4. Product:The prod() function returns the product of all elements in the array (i.e., multiplies all elements together). 5. Standard Deviation andVariance: std() calculates the standard deviation (how spread out the numbers are). 6. var() calculates the variance (the square of the standard deviation). 7. Cumulative Sum and Product : cumsum() gives the cumulative sum (the sum of the elements up to each index).cumprod() gives the cumulative product (the product of elements up to each index). NumPy provides functions to compute statistics on arrays: arr = np.array([1, 2, 3, 4, 5]) mean = np.mean(arr) # Calculate the mean (average) max_val = np.max(arr) # Find the maximum value min_val = np.min(arr) # Find the minimum value
  • 14. SHERIN RAPPAI VECTORIZED COMPUTATION Vectorized computation in Python refers to performing operations on entire arrays or sequences of data without the need for explicit loops.This approach leverages highly optimized, low-level code to achieve faster and more efficient computations.The primary library for vectorized computation in Python is NumPy. Traditional Loop-Based Computation In traditional Python programming, you might use explicit loops to perform operations on arrays or lists. For example: # Using loops to add two lists element-wise list1 = [1, 2, 3] list2 = [4, 5, 6] result = [] for i in range(len(list1)): result.append(list1[i] + list2[i]) # Result: [5, 7, 9]
  • 15. SHERIN RAPPAI Vectorized Computation with NumPy NumPy allows you to perform operations on entire arrays, making code more concise and efficient. Here's how you can achieve the same result using NumPy: import numpy as np # Using NumPy for element-wise addition arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 + arr2 # Result: array([5, 7, 9])
  • 16. SHERIN RAPPAI INTRODUCTION TO PANDAS DATA STRUCTURES Pandas is a popular Python library for data manipulation and analysis. It provides two primary data structures: the DataFrame and the Series.These data structures are designed to handle structured data, making it easier to work with datasets in a tabular format. DataFrame:  A DataFrame is a 2-dimensional, labeled data structure that resembles a spreadsheet or SQL table.  It consists of rows and columns, where each column can have a different data type (e.g., integers, floats, strings, or even custom data types).  You can think of a DataFrame as a collection of Series objects, where each Series is a column.  DataFrames are highly versatile and are used for a wide range of data analysis tasks, including data cleaning, exploration, and transformation.
  • 17. SHERIN RAPPAI import pandas as pd # Creating a DataFrame from a dictionary of data data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NewYork', 'San Francisco', 'Los Angeles']} df = pd.DataFrame(data) # Displaying the DataFrame print(df) Here's a basic example of how to create a DataFrame using Pandas:
  • 18. SHERIN RAPPAI Series:  A Series is a one-dimensional labeled array that can hold data of any data type.  It is like a column in a DataFrame or a single variable in statistics.  Series objects are commonly used for time series data, as well as other one-dimensional data. Key characteristics of a Pandas Series:  Homogeneous Data: Unlike Python lists or NumPy arrays, a Pandas Series enforces homogeneity, meaning all the data within a Series must be of the same data type. For example, if you create a Series with integer values, all values within that Series will be integers.  Labeled Data: Series have two parts: the data itself and an associated index.The index provides labels or names for each data point in the Series. By default, Series have a numeric index starting from 0, but you can specify custom labels if needed.  Size and Shape:A Series has a size (the number of elements) and shape (1-dimensional) but does not have columns or rows like a DataFrame.
  • 19. SHERIN RAPPAI import pandas as pd # Create a Series from a list data = [10, 20, 30, 40, 50] series = pd.Series(data) # Display the Series print(series) 0 10 1 20 2 30 3 40 4 50 dtype: int64
  • 20. SHERIN RAPPAI Some common tasks you can perform with Pandas:  Data Loading: Pandas can read data from various sources, including CSV files, Excel spreadsheets, SQL databases, and more.  Data Cleaning: You can clean and preprocess data by handling missing values, removing duplicates, and transforming data types.  Data Selection: Easily select specific rows and columns of interest using various indexing techniques.  Data Aggregation: Perform groupby operations, calculate statistics, and aggregate data based on specific criteria.  Data Visualization: You can use Pandas in conjunction with visualization libraries like Matplotlib and Seaborn to create informative plots and charts.
  • 21. SHERIN RAPPAI A DataFrame in Python typically refers to a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure provided by the popular library called Pandas. It is a fundamental data structure for data manipulation and analysis in Python. Here's how you can work with DataFrames in Python using Pandas: 1. Import Pandas: First, you need to import the Pandas library. import pandas as pd 2. Creating a DataFrame: You can create a DataFrame in several ways. Here are a few common methods: From a dictionary: data = {'Column1': [value1, value2, ...], 'Column2': [value1, value2, ...]} df = pd.DataFrame(data) DataFrame
  • 22. SHERIN RAPPAI • From a list of lists: data = [[value1, value2], [value3, value4]] df = pd.DataFrame(data, columns=['Column1', 'Column2']) • From a CSV file: df = pd.read_csv('file.csv') 3.Viewing Data: You can use various methods to view and explore your DataFrame: df.head(): Displays the first few rows of the DataFrame. df.tail(): Displays the last few rows of the DataFrame. df.shape: Returns the number of rows and columns. df.columns: Returns the column names. df.info(): Provides information about the DataFrame, including data types and non-null counts.
  • 23. SHERIN RAPPAI 4. Selecting Data: You can select specific columns or rows from a DataFrame using indexing or filtering. For example: df['Column1'] # Select a specific column df[['Column1', 'Column2']] # Select multiple columns df[df['Column1'] > 5] # Filter rows based on a condition 5. Modifying Data: You can modify the DataFrame by adding or modifying columns, updating values, or appending rows. For example: df['NewColumn'] = [new_value1, new_value2, ...] # Add a new column df.at[index, 'Column1'] = new_value # Update a specific value df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) # Append a new row
  • 24. SHERIN RAPPAI 6. Data Analysis: Pandas provides various functions for data analysis, such as describe(), groupby(), agg(), and more. 7. Saving Data: You can save the DataFrame to a CSV file or other formats: df.to_csv('output.csv', index=False) df.to_excel('output.xlsx', index=False)
  • 25. SHERIN RAPPAI INDEX OBJECTS-INDEXING, SELECTION,AND FILTERING In Pandas, the Index object is a fundamental component of both Series and DataFrame data structures. It provides the labels or names for the rows or columns of your data.You can use indexing, selection, and filtering techniques with these indexes to access specific data points or subsets of your data. Here's how you can work with index objects in Pandas: 1. Indexing: Indexing allows you to access specific elements or rows in your data using labels.You can use .loc[] for label-based indexing and .iloc[] for integer-based indexing. • Label-based indexing: df.loc['label'] # Access a specific row by its label df.loc['label', 'column_name'] # Access a specific element by label and column name
  • 26. SHERIN RAPPAI EXAMPLE import pandas as pd # Create a DataFrame with custom labels data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']} df = pd.DataFrame(data, index=['A1', 'B2', 'C3']) # Access the row with label 'B2' print(df.loc['B2’]) # Access the value in the row with label 'B2' and the column 'City' print(df.loc['B2', 'City'])
  • 27. SHERIN RAPPAI • Integer-based indexing: df.iloc[0] # Access the first row df.iloc[0, 1] # Access an element by row and column index 2. Selection: Selection refers to choosing specific columns or rows from a DataFrame based on their labels or positions.You use selection when you want to extract specific columns or rows without applying any condition. It’s about choosing specific data (columns/rows) directly. No conditional logic is applied df['Column1'] # Select 'Column1' from the DataFrame df[['Column1', 'Column2']] # Select 'Column1' and 'Column2' df.loc[0] # Select the first row by index label df.iloc[2] # Select the third row by integer position
  • 28. SHERIN RAPPAI 3. Filtering: You can use various methods to select specific data based on conditions or criteria. • Select rows based on a condition: • df[df['Column'] > 5] # Select rows where 'Column' is greater than 5 • Select rows by multiple conditions: • df[(df['Column1'] > 5) & (df['Column2'] < 10)] # Rows where 'Column1' > 5 and 'Column2' < 10 Filtering allows you to create a boolean mask based on a condition and then apply that mask to your DataFrame to select rows meeting the condition. Create a boolean mask: condition = df['Column'] > 5 Apply the mask to the DataFrame: filtered_df = df[condition] 4. Setting a New Index: You can set a specific column as the index of your DataFrame using the .set_index() method. df.set_index('Column_Name', inplace=True)
  • 29. SHERIN RAPPAI 5. Resetting the Index: If you've set a column as the index and want to revert to the default integer-based index, you can use the .reset_index() method. df.reset_index(inplace=True) 6. Multi-level Indexing: You can create DataFrames with multi-level indexes, allowing you to work with more complex hierarchical data structures. df.set_index(['Index1', 'Index2'], inplace=True) Index objects in Pandas are versatile and powerful for working with data because they enable you to access and manipulate your data in various ways, whether it's for data retrieval, filtering, or restructuring.
  • 30. SHERIN RAPPAI ARITHMETIC AND DATA ALIGNMENT IN PANDAS Arithmetic and data alignment in Pandas refer to how mathematical operations are performed between Series and DataFrames when they have different shapes or indices. Pandas automatically aligns data based on the labels of the objects involved in the operation, which ensures that the result of the operation maintains data integrity and is aligned correctly. Here are some key aspects of arithmetic and data alignment in Pandas: 1.Automatic Alignment: When you perform mathematical operations (e.g., addition, subtraction, multiplication, division) between two Series or DataFrames, Pandas aligns the data based on their labels (index or column names). It aligns the data based on common labels and performs the operation only on matching labels. series1 = pd.Series([1, 2, 3], index=['A', 'B', 'C’]) series2 = pd.Series([4, 5, 6], index=['B', 'C', 'D']) result = series1 + series2 In this example, the result Series will have NaN values for the 'A' and 'D' labels because those labels don't match between series1 and series2. A NaN B 6.0 C 8.0 D NaN dtype: float64
  • 31. SHERIN RAPPAI 2. Missing Data (NaN): When labels don't match, Pandas fills in the result with NaN (Not-a-Number) to indicate missing values. 3. DataFrame Alignment: The same principles apply to DataFrames when performing operations between them.The alignment occurs both for rows (based on the index) and columns (based on column names). df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['X', 'Y']) df2 = pd.DataFrame({'B': [5, 6], 'C': [7, 8]}, index=['Y', 'Z']) result = df1 + df2 In this case, result will have NaN values in columns 'A' and 'C' because those columns don't exist in both df1 and df2. 4. Handling Missing Data: You can use methods like .fillna() to replace NaN values with a specific value or use .dropna() to remove rows or columns with missing data. result_filled = result.fillna(0) # Replace NaN with 0 result_dropped = result.dropna() # Remove rows or columns with NaN values
  • 32. SHERIN RAPPAI 5.Alignment with Broadcasting: Pandas allows you to perform operations between a Series and a scalar value, and it broadcasts the scalar to match the shape of the Series. series = pd.Series([1, 2, 3]) scalar = 2 result = series * scalar In this example, result will be a Series with values [2, 4, 6]. Automatic alignment in Pandas is a powerful feature that simplifies data manipulation and allows you to work with datasets of different shapes without needing to manually align them. It ensures that operations are performed in a way that maintains the integrity and structure of your data.
  • 33. SHERIN RAPPAI ARITHMETIC AND DATA ALIGNMENT IN NUMPY NumPy, like Pandas, performs arithmetic and data alignment when working with arrays. However, unlike Pandas, NumPy is primarily focused on numerical computations with homogeneous arrays (arrays of the same data type). Here's how arithmetic and data alignment work in NumPy: Automatic Alignment: NumPy arrays perform element-wise operations, and they automatically align data based on the shape of the arrays being operated on.This means that if you perform an operation between two NumPy arrays of different shapes, NumPy will broadcast the smaller array to match the shape of the larger one, element-wise. import numpy as np arr1 = np.array([1, 2, 3]) arr2 = np.array([4]) result = arr1 + arr2 In this example, NumPy will automatically broadcast arr2 to match the shape of arr1, resulting in [5, 7, 8].
  • 34. SHERIN RAPPAI Broadcasting Rules: NumPy follows specific rules when broadcasting arrays: If the arrays have a different number of dimensions, pad the smaller shape with ‘1’ on the left side. For example: •Shape (3, 5) and shape (5) become (3, 5) and (1, 5). NumPy adds a 1 on the left to make both arrays 2D. Compare the shapes element-wise, starting from the right. If dimensions are equal or one of them is 1, they are compatible. If the dimensions are incompatible, NumPy raises a "ValueError: operands could not be broadcast together" error. Shape (3, 5) and (1, 5):The second dimensions (5 and 5) are the same, and the first dimensions (3 and 1) are compatible because 1 can be stretched to 3. Handling Missing Data: In NumPy, there is no concept of missing data like NaN in Pandas. If you perform operations between arrays with mismatched shapes, NumPy will either broadcast or raise an error, depending on whether broadcasting is possible. Element-Wise Operations: NumPy performs arithmetic operations element-wise by default.This means that each element in the resulting array is the result of applying the operation to the corresponding elements in the input arrays. arr1 = np.array([1, 2, 3]) arr2 = np.array([4, 5, 6]) result = arr1 * arr2
  • 35. SHERIN RAPPAI APPLYING FUNCTIONS AND MAPPING In NumPy, you can apply functions and perform element-wise operations on arrays using various techniques, including vectorized functions, np.apply_along_axis(), and the np.vectorize() function. Additionally, you can use the np.vectorize() function for mapping operations. Here's an overview of these approaches: Vectorized Functions: NumPy is designed to work efficiently with vectorized operations, meaning you can apply functions to entire arrays or elements of arrays without the need for explicit loops. NumPy provides built-in functions that can be applied element- wise to arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Applying a function element-wise result = np.square(arr) # Square each element In this example, the np.square() function is applied element-wise to the arr array.
  • 36. SHERIN RAPPAI ‘np.apply_along_axis(): You can use the np.apply_along_axis() function to apply a function along a specified axis of a multi-dimensional array.This is useful when you want to apply a function to each row or column of a 2D array. import numpy as np arr = np.array([[1, 2, 3], [4, 5, 6]]) # Apply a function along the rows (axis=1) def sum_of_row(row): return np.sum(row) result = np.apply_along_axis(sum_of_row, axis=1, arr=arr) In this example, sum_of_row is applied to each row along axis=1, resulting in a new 1D array.
  • 37. SHERIN RAPPAI np.vectorize(): The np.vectorize() function allows you to create a vectorized version of a Python function, which can then be applied element-wise to NumPy arrays. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Apply the vectorized function to the array result = vectorized_func(arr) This approach is useful when you have a custom function that you want to apply to an array.
  • 38. SHERIN RAPPAI Mapping with np.vectorize(): You can use np.vectorize() to map a function to each element of an array. import numpy as np arr = np.array([1, 2, 3, 4]) # Define a Python function def my_function(x): return x * 2 # Create a vectorized version of the function vectorized_func = np.vectorize(my_function) # Map the function to each element result = vectorized_func(arr) This approach is similar to applying a function element-wise but can be used for more complex mapping operations. These methods allow you to apply functions and perform mapping operations efficiently on NumPy arrays, making it a powerful library for numerical and scientific computing tasks.
  • 39. SHERIN RAPPAI SORTING AND RANKING Sorting and ranking are common data manipulation operations in data analysis and are widely supported in Python through libraries like NumPy and Pandas.These operations help organize data in a desired order or rank elements based on specific criteria. Here's how to perform sorting and ranking in both libraries: Sorting in NumPy: In NumPy, you can sort NumPy arrays using the np.sort() and np.argsort() functions. np.sort():This function returns a new sorted array without modifying the original array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) sorted_arr = np.sort(arr)
  • 40. SHERIN RAPPAI np.argsort():This function returns the indices that would sort the array.You can use these indices to sort the original array. import numpy as np output: arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) Indices of sorted array: [1 3 6 0 9 2 4 5 8 7 10] indices = np.argsort(arr) print("Indices of sorted array:", sorted_indices) sorted_arr = arr[indices] print("Sorted array:", sorted_arr) Sorting in Pandas: In Pandas, you can sort Series and DataFrames using the sort_values() method.You can specify the column(s) to sort by and the sorting order. import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 35]} df = pd.DataFrame(data) # Sort by 'Age' column in ascending order sorted_df = df.sort_values(by='Age', ascending=True)
  • 41. SHERIN RAPPAI NumPy doesn't have a built-in ranking function, but you can use np.argsort() to get the ranking of elements.You can then use these rankings to create a ranked array. import numpy as np arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]) indices = np.argsort(arr) ranked_arr = np.argsort(indices) + 1 # Add 1 to start ranking from 1 instead of 0 Ranking in Pandas: In Pandas, you can rank data using the rank() method.You can specify the sorting order and how to handle ties (e.g., assigning the average rank to tied values). import pandas as pd data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [25, 30, 22, 30]} df = pd.DataFrame(data) # Rank by 'Age' column in descending order and assign average rank to tied values df['Rank'] = df['Age'].rank(ascending=False, method='average') Ranking in NumPy:
  • 42. SHERIN RAPPAI SUMMARIZING AND COMPUTING DESCRIPTIVE STATISTICS 1. Summary Statistics: NumPy provides functions to compute summary statistics directly on arrays. import numpy as np data = np.array([25, 30, 22, 35, 28]) mean = np.mean(data) median = np.median(data) std_dev = np.std(data) variance = np.var(data)
  • 43. SHERIN RAPPAI 2. Percentiles and Quartiles: You can compute specific percentiles and quartiles using the np.percentile() function. percentile_25 = np.percentile(data, 25) percentile_75 = np.percentile(data, 75) 3. Correlation and Covariance: You can compute correlation and covariance between arrays using np.corrcoef() and np.cov(). correlation_matrix = np.corrcoef(data1, data2) covariance_matrix = np.cov(data1, data2)
  • 44. SHERIN RAPPAI CORRELATION AND COVARIANCE In NumPy, you can compute correlation and covariance between arrays using the np.corrcoef() and np.cov() functions, respectively.These functions are useful for analyzing relationships and dependencies between variables. Here's how to use them: Computing Correlation Coefficient (Correlation): The correlation coefficient measures the strength and direction of a linear relationship between two variables. It ranges from -1 (perfect negative correlation) to 1 (perfect positive correlation), with 0 indicating no linear correlation. import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6])
  • 45. SHERIN RAPPAI # Compute the correlation coefficient between x and y correlation_matrix = np.corrcoef(x, y) # The correlation coefficient is in the (0, 1) element of the matrix correlation_coefficient = correlation_matrix[0, 1] In this example, correlation_coefficient will contain the Pearson correlation coefficient between x and y.
  • 46. SHERIN RAPPAI Computing Covariance: Covariance measures the degree to which two variables change together. Positive values indicate a positive relationship (both variables increase or decrease together), while negative values indicate an inverse relationship (one variable increases as the other decreases). import numpy as np # Create two arrays representing variables x = np.array([1, 2, 3, 4, 5]) y = np.array([2, 3, 4, 5, 6]) # Compute the covariance between x and y covariance_matrix = np.cov(x, y) # The covariance is in the (0, 1) element of the matrix covariance = covariance_matrix[0, 1] In this example, covariance will contain the covariance between x and y. Both np.corrcoef() and np.cov() can accept multiple arrays as input, allowing you to compute correlations and covariances for multiple variables simultaneously. For example, if you have a dataset with multiple columns, you can compute the correlation matrix or covariance matrix for all pairs of variables.
  • 47. SHERIN RAPPAI HANDLING MISSING DATA Handling missing data in NumPy is an important aspect of data analysis and manipulation. NumPy provides several ways to work with missing or undefined values, typically represented as NaN (Not-a-Number). Here are some common techniques for handling missing data in NumPy: Using np.nan: NumPy represents missing data using np.nan.You can create arrays with missing values like this: import numpy as np arr = np.array([1.0, 2.0, np.nan, 4.0]) Now, arr contains a missing value represented as np.nan.
  • 48. SHERIN RAPPAI Checking for Missing Data:You can check for missing values using the np.isnan() function. For example: np.isnan(arr) # Returns a boolean array indicating which elements are NaN. Filtering Missing Data:To filter out missing values from an array, you can use boolean indexing. For example: arr[~np.isnan(arr)] # Returns an array without NaN values. Replacing Missing Data:You can replace missing values with a specific value using np.nan_to_num() or np.nanmean(). For example: arr[np.isnan(arr)] = 0 # Replace NaN with 0 Or, to replace NaN with the mean of the non-missing values: mean = np.nanmean(arr) arr[np.isnan(arr)] = mean
  • 49. SHERIN RAPPAI Ignoring Missing Data: Sometimes, you may want to perform operations while ignoring missing values.You can use functions like np.nanmax(), np.nanmin(), np.nansum(), etc., which ignore NaN values when computing the result. Interpolation: If you have a time series or ordered data, you can use interpolation methods to fill missing values. NumPy provides functions like np.interp() for this purpose. Masked Arrays: NumPy also supports masked arrays (numpy.ma) that allow you to work with missing data more explicitly by creating a mask that specifies which values are missing.This can be useful for certain computations. import numpy as np import numpy.ma as ma arr = np.array([1, 2, np.nan, 4]) masked_arr = ma.masked_array(arr, np.isnan(arr)) # Mask NaN values mean_val = masked_arr.mean() # Calculates mean ignoring NaNs Handling Missing Data in Multidimensional Arrays: If you're working with multidimensional arrays, you can apply the above techniques along a specific axis or use functions like np.isnan() with the axis parameter to handle missing data along specific dimensions. Keep in mind that the specific method you choose to handle missing data depends on your data analysis goals and the context of your data. Some methods may be more appropriate than others, depending on your use case.
  • 50. SHERIN RAPPAI HIERARCHICAL INDEXING Hierarchical indexing in NumPy is often referred to as "MultiIndexing" and allows you to work with multi-dimensional arrays where each dimension has multiple levels or labels.This is particularly useful when you want to represent higher- dimensional data with more complex hierarchical structures. You can create a MultiIndex in NumPy using the numpy.MultiIndex class. Here's a basic example: import numpy as np import pandas as pd # Import pandas # Create a MultiIndex with two levels index = pd.MultiIndex.from_arrays([['A', 'A', 'B', 'B'], [1, 2, 1, 2], ['X', 'Y', 'X', 'Y']], names=['Level1', 'Level2', 'Level3']) # Create a random data array data = np.random.rand(4, 3) # Create a DataFrame with MultiIndex df = pd.DataFrame(data, index=index, columns=['Value1', 'Value2', 'Value3']) print(df) Value1 Value2 Value3 Level1 Level2 Level3 A 1 X 0.654321 0.123456 0.987654 2 Y 0.234567 0.345678 0.456789 B 1 X 0.987654 0.876543 0.765432 2 Y 0.123456 0.234567 0.345678
  • 51. SHERIN RAPPAI In this example, we've created a MultiIndex with two levels: 'A' and 'B' as the first level, and '1', '2' as the second level. Then, we've created a DataFrame with this MultiIndex and some random data. You can access data from this DataFrame using hierarchical indexing. For example: # Accessing data using hierarchical indexing value_A1_X = df.loc[('A', 1, 'X')]['Value1'] # AccessValue1 for 'A', 1, 'X'
  • 52. SHERIN RAPPAI Some common operations with hierarchical indexing include: Slicing:You can perform slices at each level of the index, allowing you to select specific subsets of the data. Stacking and Unstacking: Stacking: Converts columns into a new level of the index. Unstacking: Moves one level of the index back into columns. Swapping Levels:You can swap levels to change the order of the levels in the index. # Swap 'Letter' and 'Number' levels print(df.swaplevel('Letter', 'Number')) Grouping and Aggregating: You can group data based on levels of the index and perform aggregation functions like mean, sum, etc. Reordering Levels:You can change the order of levels in the index. Resetting Index: You can reset the index to move the hierarchical index levels back to columns. Value1 Value2 Number Letter 1 A 10 100 2 A 20 200 1 B 30 300 2 B 40 400 Value1 Value2 Number Letter 1 A 10 100 2 A 20 200 1 B 30 300 2 B 40 400
  • 53. SHERIN RAPPAI Hierarchical indexing is especially valuable when dealing with multi-dimensional data, such as panel data or data with multiple categorical variables. It allows for more expressive data organization and manipulation.You can also use the pd.MultiIndex class from the pandas library, which provides more advanced functionality for working with hierarchical data structures, including various methods for creating and manipulating MultiIndex objects.

Editor's Notes

  • #9: By using :: and specifying -1 for the step, you're telling Python to:Start from the end of the array and move backward (step size of -1). This effectively reverses the array.
  • #16: Versatility: DataFrames are incredibly powerful and versatile. They can handle various tasks, such as: Data cleaning: Fixing or removing incorrect, incomplete, or duplicate data. Exploration: Summarizing data, performing statistical calculations, and visualizing trends. Transformation: Applying functions, aggregations, and pivoting data for further analysis
  • #18: Stock prices recorded every minute.Daily temperatures over a year.Monthly sales revenue of a company.Heartbeat measurements from a fitness tracker over time. Size: Series: The size refers to the total number of elements in the Series, similar to the length of a list. It counts how many data points the Series holds. Shape A Series is always 1-dimensional, which means it only has a single axis (the values), even if it looks like a column of data. The shape of a Series will be (n,), where n is the number of elements.
  • #20: Data Loading: pd.read_csv('file.csv’) pd.read_excel('file.xlsx’) pd.read_sql_query(). Data Cleaning: Handling missing values: Replace or remove missing data using fillna() or dropna().Removing duplicates: Identify and remove duplicate rows with drop_duplicates().Transforming data types: Convert data types using astype() Data Selection: Pandas provides multiple ways to select specific rows and columns: Select columns using df['column_name'] or df[['col1', 'col2']]. Data Aggregation: You can group data by specific criteria and perform aggregations such as sum, mean, count, etc. The groupby() function is used to split the data into groups before applying an aggregation. Common aggregation methods include sum(), mean(), count(), max(), and min() Data Visualization: While Pandas itself has basic plotting capabilities (df.plot()), it is commonly used alongside libraries like Matplotlib and Seaborn to create more sophisticated plots and charts.
  • #23: This creates a Boolean Series (a sequence of True or False values), where each value corresponds to whether the condition 'Column1' > 5 is satisfied for each row in the DataFrame df.For example, if df['Column1'] contains values [3, 7, 1, 9], the condition > 5 is applied to each element, resulting in the Boolean Series [False, True, False, True]. This line filters the rows of the DataFrame based on a condition.df['Column1'] > 5 returns a boolean Series where True represents the rows where 'Column1' has values greater than 5, and False represents the rows where it doesn't.df[...] returns the subset of rows where the condition is True.The result will be a pandas DataFrame with only the rows that meet the condition. df.at[index, 'Column1'] = new_value - Update a specific value: The .at[] method is used to access and update a specific cell in the DataFrame. index refers to the row index, and 'Column1' refers to the column. The value at the intersection of the given row and column is updated to new_value. df = df.append({'Column1': value1, 'Column2': value2}, ignore_index=True) - Append a new row: This appends a new row to the DataFrame df. A dictionary {'Column1': value1, 'Column2': value2} defines the data for the new row. ignore_index=True resets the index of the new row, ensuring it gets added with a new sequential index rather than trying to maintain the original index. The result is a new DataFrame with the appended row.
  • #24: In pandas, the index=False argument is used to exclude the DataFrame's index (row labels) when saving it to a file (e.g., CSV, Excel, etc.). In pandas, the describe() function generates descriptive statistics for the DataFrame or a specific column. It provides a quick summary of the central tendency, dispersion, and shape of a dataset’s distribution, including: count: The number of non-null entries. mean: The average of the values. std: The standard deviation, a measure of the spread of the data. min: The minimum value. 25%: The 25th percentile (first quartile). 50%: The 50th percentile (median). 75%: The 75th percentile (third quartile). max: The maximum value.
  • #27: import pandas as pd # Create a DataFrame data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35], 'City': ['NY', 'LA', 'SF']} df = pd.DataFrame(data) # Access the first row print(df.iloc[0]) Name Alice Age 25 City NY Name: 0, dtype: object
  • #28: Purpose: It sets the values of the specified column ('Column_Name') as the new index of the DataFrame df. This operation replaces the default integer-based index (0, 1, 2, …) with the values from 'Column_Name’. inplace=True: This means the operation will be performed in-place, meaning the DataFrame df will be modified directly without needing to assign it to a new variable. If this parameter were set to False, the original DataFrame would remain unchanged, and a new DataFrame with the new index would be returned.
  • #29: import pandas as pd # Create a sample DataFrame data = {'Country': ['USA', 'USA', 'Canada', 'Canada'], 'State': ['New York', 'California', 'Ontario', 'Quebec'], 'Population': [19.45, 39.51, 14.57, 8.43]} df = pd.DataFrame(data) # Set 'Country' and 'State' as a multi-level index df.set_index(['Country', 'State'], inplace=True) print(df) inplace=True: This parameter ensures that the operation is performed on the original DataFrame itself, rather than returning a new DataFrame with the reset index. If inplace=False (the default), the method returns a new DataFrame and does not modify the original one.
  • #34: In NumPy, broadcasting is a powerful feature that allows operations on arrays of different shapes, as long as they are compatible based on specific rules. Broadcasting makes it easy to perform element-wise operations without having to reshape or replicate arrays manually. After aligning the shapes by padding, NumPy compares the dimensions one by one from right to left. For each dimension: If the dimensions are the same size, they are compatible.If one of the dimensions is 1, it is "stretched" to match the other dimension.If they are neither the same nor 1, the shapes are incompatible, and broadcasting fails.
  • #36: If you wanted to apply the function along the columns (i.e., column-wise), you would set axis=0 The first row [1, 2, 3] is passed to sum_of_row, which returns 1 + 2 + 3 = 6.The second row [4, 5, 6] is passed to sum_of_row, which returns 4 + 5 + 6 = 15 The first arr is the parameter name that apply_along_axis expects, telling it which array to process.The second arr is the variable name you defined earlier in your code (the actual array [[1, 2, 3], [4, 5, 6]]).
  • #37: Element-wise Operations: Vectorized functions apply an operation to each element of an array simultaneously, enabling batch processing rather than one-at-a-time processing. np.vectorize() creates a new function vectorized_func that applies my_function element-wise to a NumPy array. he vectorized function is applied to each element of arr:my_function(1) returns 1 * 2 = 2 my_function(2) returns 2 * 2 = 4 my_function(3) returns 3 * 2 = 6 my_function(4) returns 4 * 2 = 8
  • #39: desc_sorted_arr = np.sort(arr)[::-1]
  • #40: The np.argsort() function in NumPy returns the indices that would sort an array. Unlike np.sort(), which returns the sorted array itself, np.argsort() provides the indices of the sorted elements. This can be particularly useful when you want to keep track of the original order of the elements after sorting. Ascending = False for descending order
  • #41: This line ranks the entries in the 'Age' column: ascending=False: This means that higher ages receive a higher rank. method='average': This specifies that if there are ties in the ranks (like Bob and David both being age 30), they will be assigned the average of their ranks.
  • #43: Percentiles are values below which a given percentage of observations in a group of observations falls. For example, the 25th percentile (also known as the first quartile) is the value below which 25% of the data points lie. Quartiles divide the data into four equal parts: Q1 (First Quartile): 25th percentile Q2 (Second Quartile): 50th percentile (median) Q3 (Third Quartile): 75th percentile Correlation measures the strength and direction of a linear relationship between two variables. It ranges from -1 to 1: A correlation of 1 indicates a perfect positive linear relationship. A correlation of -1 indicates a perfect negative linear relationship. A correlation of 0 indicates no linear relationship. Covariance measures the degree to which two variables change together.
  • #49: Interpolation is a technique for estimating unknown values in a sequence based on surrounding data. This is useful in time series or other ordered data when you have gaps or missing values. np.interp(): It linearly interpolates between points to fill missing values. However, it requires you to provide the x-values (indices or times) and corresponding y-values (data) to interpolate. 3. Masked Arrays NumPy provides the numpy.ma module for creating masked arrays. This allows you to explicitly handle missing or invalid data by "masking" certain values. Masked Arrays (numpy.ma): Let you mask specific values (e.g., NaN), so they are ignored during calculations.
  • #50: np.vstack(): Vertically stacks arrays, so now you're adding a third level: ['X', 'Y', 'X', 'Y']. This adds another layer of labeling. .T (Transpose): The T transposes the array so that each "row" is now a tuple of 3 labels (outer, inner, and sub-level). MultiIndexing or hierarchical indexing allows you to represent higher-dimensional data in a structured way by breaking down the indices into multiple levels.In this example, you created a MultiIndex with 3 levels and then created a pandas DataFrame with random data, indexed by this MultiIndex.This is useful for organizing complex data where each observation belongs to multiple categories, making it easier to analyze and manipulate.
  • #52: Hierarchical indexing in pandas (also known as MultiIndexing) enables you to work with multi-level indexed data efficiently. With it, you can perform various operations that provide more flexibility when working with complex datasets. Here’s a brief explanation of common operations associated with hierarchical indexing: 1. Slicing: You can slice the data at different levels of the index to retrieve specific subsets. df.loc['A'] # Slicing by the first level 'A' df.loc[('A', 1)] # Slicing by both the first and second levels This helps in isolating parts of the data easily, depending on which levels you want to focus on. 2. Stacking and Unstacking: Stacking turns columns into rows (long format), while unstacking moves rows into columns (wide format). This is useful for reshaping data. df_stacked = df.stack() # Stack columns into rows df_unstacked = df.unstack() # Unstack rows into columns Stacking makes the DataFrame more compact (often useful in time series data). Unstacking can help in widening the data for better readability. 3. Swapping Levels: You can swap the levels in the index to reorder them or change the hierarchy. df_swapped = df.swaplevel(0, 1) # Swap the first and second levels This changes how pandas interprets your hierarchical structure, which can affect operations like slicing. 4. Grouping and Aggregating: You can group data based on levels of the index and then apply aggregation functions like mean, sum, etc. df.groupby(level=0).sum() # Group by the first level and sum the values df.groupby(level=[0, 1]).mean() # Group by the first two levels and calculate the mean This is useful for summarizing data across categories (levels). 5. Reordering Levels: You can change the order of the levels in the hierarchical index. df_reordered = df.reorder_levels([2, 0, 1]) # Reorder the levels (third, first, second) Changing the level order can be useful for making slicing or other operations more intuitive based on your analysis needs. 6. Resetting Index: You can "flatten" the hierarchical index and turn it back into regular columns. df_reset = df.reset_index() # Convert the multi-index back to columns Resetting the index is helpful when you no longer need the hierarchical structure and prefer to work with simple columns. Swapping refers to exchanging the positions of two specific levels of a MultiIndex. This operation changes the order of the two levels you specify but leaves the other levels in their original order. Use case: When you want to switch the positions of exactly two levels in the index.