Slicing Column Values in Pandas
Last Updated :
11 Jul, 2024
Slicing column values in Pandas is a crucial operation in data manipulation and analysis. Pandas, a powerful Python library, provides various methods to slice and extract specific data from DataFrames. This article will delve into the different techniques for slicing column values, highlighting their syntax, examples, and applications.
Introduction to Pandas DataFrame
A Pandas DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It is similar to a spreadsheet or SQL table and is one of the most commonly used data structures in data analysis.
To get started, let's create a simple DataFrame:
Python
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 Los Angeles
2 Charlie 35 Chicago
Slicing Column Values using Indexing
1. Positional Indexing with iloc
The iloc function is used for positional indexing, which allows you to slice data based on numerical positions.
Python
# Slicing the first two rows of the 'Name' column
names = df.iloc[:2, 0]
print(names)
Output:
0 Alice
1 Bob
Name: Name, dtype: object
2. Label-based Indexing with loc
The loc function is used for label-based indexing, which allows you to slice data based on row and column labels.
Python
# Slicing the 'Name' column for the first two rows
names = df.loc[:1, 'Name']
print(names)
Output:
0 Alice
1 Bob
Name: Name, dtype: object
Slicing Column Values using String Methods
1. Accessing Substrings
You can access substrings of column values using the str accessor.
Python
# Extracting the first three characters of each name
df['Name_Short'] = df['Name'].str[:3]
print(df)
Output:
Name Age City Name_Short
0 Alice 25 New York Ali
1 Bob 30 Los Angeles Bob
2 Charlie 35 Chicago Cha
2. Using Regular Expressions
Regular expressions can be used for more complex slicing.
Python
# Extracting only the digits from the 'City' column (although in this case, there are none)
df['City_Digits'] = df['City'].str.extract('(\d+)', expand=False)
print(df)
Output:
Name Age City Name_Short City_Digits
0 Alice 25 New York Ali NaN
1 Bob 30 Los Angeles Bob NaN
2 Charlie 35 Chicago Cha NaN
Slicing Column Values in Pandas : Advanced Techniques
1. Slicing with apply and lambda
The apply function combined with a lambda function provides a flexible way to slice column values.
Python
# Extracting the first letter of each city name
df['City_First_Letter'] = df['City'].apply(lambda x: x[0])
print(df)
Output:
Name Age City Name_Short City_Digits City_First_Letter
0 Alice 25 New York Ali NaN N
1 Bob 30 Los Angeles Bob NaN L
2 Charlie 35 Chicago Cha NaN C
2. Using str.split for Complex Slicing
The str.split method splits strings based on a specified delimiter and returns a list. You can then slice these lists to extract specific parts.
Python
# Splitting the 'Name' column by the letter 'l' and taking the first part
df['Name_Split'] = df['Name'].str.split('l').str[0]
print(df)
Output:
Name Age City Name_Short City_Digits City_First_Letter \
0 Alice 25 New York Ali NaN N
1 Bob 30 Los Angeles Bob NaN L
2 Charlie 35 Chicago Cha NaN C
Name_Split
0 A
1 Bob
2 Char
Practical Examples: Slicing Columns in a Real-World Dataset
Example 1: Analyzing Titanic Passenger Data
Let's consider a dataset of Titanic passengers:
Python
import pandas as pd
# Load the Titanic dataset
url = 'https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv'
df = pd.read_csv(url)
# Display the first few rows of the dataset
print(df.head())
Output:
PassengerId Survived Pclass ... Fare Cabin Embarked
0 1 0 3 ... 7.2500 NaN S
1 2 1 1 ... 71.2833 C85 C
2 3 1 3 ... 7.9250 NaN S
3 4 1 1 ... 53.1000 C123 S
4 5 0 3 ... 8.0500 NaN S
1. Slicing Specific Columns:
Python
# Slice columns 'Name', 'Age', and 'Sex'
df_sliced = df.loc[:, ['Name', 'Age', 'Sex']]
print(df_sliced.head())
Output:
Name Age Sex
0 Braund, Mr. Owen Harris 22.0 male
1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0 female
2 Heikkinen, Miss. Laina 26.0 female
3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0 female
4 Allen, Mr. William Henry 35.0 male
2. Slicing Columns by Index:
Python
# Slice columns from index 1 to 4
df_sliced = df.iloc[:, 1:4]
print(df_sliced.head())
Output:
Survived Pclass Name
0 0 3 Braund, Mr. Owen Harris
1 1 1 Cumings, Mrs. John Bradley (Florence Briggs Thayer)
2 1 3 Heikkinen, Miss. Laina
3 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel)
4 0 3 Allen, Mr. William Henry
Example 2: Slicing Substrings in a Product Codes Dataset
Consider a dataset with product codes:
Python
import pandas as pd
# Create a DataFrame with product codes
data = {
'ProductCode': ['A12345', 'B67890', 'C54321', 'D98765'],
'Price': [100, 150, 200, 250]
}
df = pd.DataFrame(data)
print(df)
Output:
ProductCode Price
0 A12345 100
1 B67890 150
2 C54321 200
3 D98765 250
1. Extracting Product Category:
Python
# Slice the first character to get the product category
df['Category'] = df['ProductCode'].str.slice(0, 1)
print(df)
Output:
ProductCode Price Category
0 A12345 100 A
1 B67890 150 B
2 C54321 200 C
3 D98765 250 D
2. Extracting Product Number:
Python
# Slice the numeric part of the product code
df['ProductNumber'] = df['ProductCode'].str.slice(1)
print(df)
Output:
ProductCode Price Category ProductNumber
0 A12345 100 A 12345
1 B67890 150 B 67890
2 C54321 200 C 54321
3 D98765 250 D 98765
Conclusion
Slicing column values in Pandas is a fundamental skill for data manipulation and analysis. Whether you need to slice entire columns or extract substrings from column values, Pandas provides versatile methods to accomplish these tasks. By mastering these techniques, you can efficiently preprocess and analyze your data, making your data analysis workflows more effective and streamlined.
Similar Reads
Search A pandas Column For A Value Prerequisites: pandas In this article let's discuss how to search data frame for a given specific value using pandas. Function usedwhere() -is used to check a data frame for one or more condition and return the result accordingly. By default, The rows not satisfying the condition are filled with NaN
2 min read
Split Pandas Dataframe by column value Sometimes in order to analyze the Dataframe more accurately, we need to split it into 2 or more parts. The Pandas provide the feature to split Dataframe according to column index, row index, and column values, etc. Let' see how to Split Pandas Dataframe by column value in Python? Now, let's create
3 min read
Get the absolute values in Pandas Let us see how to get the absolute value of an element in Python Pandas. We can perform this task by using the abs() function. The abs() function is used to get a Series/DataFrame with absolute numeric value of each element. Syntax : Series.abs() or DataFrame.abs() Parameters : None Returns : Series
2 min read
Pandas Select Columns Simplest way to select a specific or multiple columns in pandas dataframe is by using bracket notation, where you place the column name inside square brackets. Let's consider following example: Pythonimport pandas as pd data = {'Name': ['John', 'Alice', 'Bob', 'Eve', 'Charlie'], 'Age': [25, 30, 22,
3 min read
How to Select Column Values to Display in Pandas Groupby Pandas is a powerful Python library used extensively in data analysis and manipulation. One of its most versatile and widely used functions is groupby, which allows users to group data based on specific criteria and perform various operations on these groups. This article will delve into the details
5 min read
How to take column-slices of DataFrame in Pandas? In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns.Creating Dataframe to slice columnsPython# importing pandas import pandas as pd # Using DataFrame() method from pandas module df1 = pd.
2 min read