
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Create Subset Using Character Column with Multiple Matches in R
Subsetting is one of the most important aspects of data analysis. One such situation could be subsetting the character column based on multiple values. For example, if a character column of an R data frame has 5 categories then we might want to extract only 2 or 3 or 4 values then it can be done by using the filter function of dplyr package with str_detect function of stringr package.
Consider the below data frame −
Example
Group<-sample(LETTERS[1:6],25,replace=TRUE) Response<-rnorm(25,3,0.24) df1<-data.frame(Group,Response) df1
Output
Group Response 1 A 3.040870 2 F 2.921251 3 E 2.911820 4 E 3.188297 5 B 3.054424 6 D 2.691892 7 F 2.714302 8 F 3.154340 9 F 3.058324 10 C 2.814400 11 B 3.040255 12 D 3.270639 13 A 3.197537 14 E 2.646717 15 D 2.671441 16 C 3.233093 17 F 2.555055 18 E 2.670018 19 E 2.607526 20 F 2.952952 21 C 3.257484 22 B 3.009312 23 C 3.142553 24 B 3.355754 25 B 3.262376
Loading dplyr and stringr package and filtering the df1 based on A, C, and D values in Group −
Example
library(dplyr) library(stringr) df1%>%filter(str_detect(Group,"A|C|D"))
Output
Group Response 1 A 3.040870 2 D 2.691892 3 C 2.814400 4 D 3.270639 5 A 3.197537 6 D 2.671441 7 C 3.233093 8 C 3.257484 9 C 3.142553
Example
Region<-sample(c("Asia","Oceania","Africa","America"),25,replace=TRUE) Y<-rpois(25,5) df2<-data.frame(Region,Y) df2
Output
Region Y 1 Africa 5 2 Oceania 4 3 Oceania 3 4 Oceania 3 5 Oceania 6 6 Oceania 2 7 Oceania 4 8 Oceania 6 9 Asia 1 10 Africa 4 11 Asia 7 12 Asia 10 13 Oceania 1 14 America 5 15 Oceania 3 16 Africa 8 17 Oceania 9 18 Asia 11 19 Africa 7 20 Africa 3 21 Africa 2 22 Asia 5 23 America 6 24 America 2 25 America 1
Filtering the df2 based on Oceania, America, and Africa values in Region −
Example
df2%>%filter(str_detect(Region,"Oceania|America|Africa"))
Output
Region Y 1 Africa 5 2 Oceania 4 3 Oceania 3 4 Oceania 3 5 Oceania 6 6 Oceania 2 7 Oceania 4 8 Oceania 6 9 Africa 4 10 Oceania 1 11 America 5 12 Oceania 3 13 Africa 8 14 Oceania 9 15 Africa 7 16 Africa 3 17 Africa 2 18 America 6 19 America 2 20 America 1
Advertisements