Data Analysis using Python - Notes for 3rd Year B.Sc.
Students
Unit 1: Introduction to Data Analysis and Python
- Data Analysis: Process of inspecting, cleaning, transforming, and modeling data.
- Types of Data:
- Qualitative (categorical, nominal, ordinal)
- Quantitative (discrete, continuous)
- Python Libraries:
- NumPy, Pandas, Matplotlib, Seaborn
- Jupyter Notebook: Ideal for data analysis
Unit 2: NumPy for Data Analysis
- Array Creation: np.array(), np.zeros(), np.ones(), np.arange(), np.linspace()
- Array Operations: Indexing, slicing, reshaping, broadcasting
- Mathematical Functions: np.mean(), np.std(), np.sum(), np.max()
Example:
import numpy as np
a = np.array([1, 2, 3])
print(np.mean(a)) # Output: 2.0
Unit 3: Pandas for Data Handling
- Data Structures:
- Series: 1D labeled array
- DataFrame: 2D labeled data structure
- Reading Data: pd.read_csv(), pd.read_excel()
- DataFrame Operations:
- Selecting: .loc[], .iloc[]
- Filtering: df[df['column'] > value]
- Sorting: df.sort_values()
- Grouping: df.groupby()
Example:
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
Unit 4: Data Cleaning and Preprocessing
- Handling Missing Values: df.isnull(), df.dropna(), df.fillna()
- Renaming Columns: df.rename()
- Data Type Conversion: df.astype()
- Dropping Duplicates: df.drop_duplicates()
Unit 5: Data Visualization
- Matplotlib:
- plt.plot(), plt.bar(), plt.hist(), plt.scatter()
- Seaborn:
- sns.histplot(), sns.boxplot(), sns.heatmap()
Example:
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(data=df, x='column_name')
plt.show()
Unit 6: Basic Statistical Analysis
- Descriptive Stats: Mean, Median, Mode, Variance, Std Dev
- Correlation: df.corr()
- Value Counts: df['column'].value_counts()