Data Analysis using Python
Notes for 3rd Year B.Sc. Students
Unit 1: Introduction to Data
Analysis and Python
• - Data Analysis: Inspecting, cleaning, modeling
data
• - Types of Data: Qualitative, Quantitative
• - Libraries: NumPy, Pandas, Matplotlib,
Seaborn
• - Environment: Jupyter Notebook
Unit 2: NumPy for Data Analysis
• - Create arrays: np.array(), np.zeros(),
np.arange()
• - Operations: Indexing, slicing, broadcasting
• - Functions: np.mean(), np.std(), np.sum()
Unit 2: NumPy Example
• import numpy as np
• a = np.array([1, 2, 3])
• print(np.mean(a)) # Output: 2.0
Unit 3: Pandas for Data Handling
• - Series and DataFrame structures
• - Read data: pd.read_csv(), pd.read_excel()
• - Select/Filter/Sort/Group: loc[], iloc[],
groupby()
Unit 3: Pandas Example
• import pandas as pd
• df = pd.read_csv("data.csv")
• print(df.head())
Unit 4: Data Cleaning and
Preprocessing
• - Handle missing: isnull(), dropna(), fillna()
• - Rename: df.rename(), Type conversion:
astype()
• - Drop duplicates: df.drop_duplicates()
Unit 5: Data Visualization
• - Matplotlib: plot(), bar(), hist(), scatter()
• - Seaborn: histplot(), boxplot(), heatmap()
Unit 5: Visualization Example
• import matplotlib.pyplot as plt
• import seaborn as sns
• sns.histplot(data=df, x='column_name')
• plt.show()
Unit 6: Basic Statistical Analysis
• - Descriptive stats: mean, median, std dev
• - Correlation: df.corr()
• - Value counts: df['col'].value_counts()