Create Subset Using Character Column with Multiple Matches in R



Subsetting is one of the most important aspects of data analysis. One such situation could be subsetting the character column based on multiple values. For example, if a character column of an R data frame has 5 categories then we might want to extract only 2 or 3 or 4 values then it can be done by using the filter function of dplyr package with str_detect function of stringr package.

Consider the below data frame −

Example

 Live Demo

Group<-sample(LETTERS[1:6],25,replace=TRUE)
Response<-rnorm(25,3,0.24)
df1<-data.frame(Group,Response)
df1

Output

   Group Response
1  A    3.040870
2  F    2.921251
3  E    2.911820
4  E    3.188297
5  B    3.054424
6  D    2.691892
7  F    2.714302
8  F    3.154340
9  F    3.058324
10 C    2.814400
11 B    3.040255
12 D    3.270639
13 A    3.197537
14 E    2.646717
15 D    2.671441
16 C    3.233093
17 F    2.555055
18 E    2.670018
19 E    2.607526
20 F    2.952952
21 C    3.257484
22 B    3.009312
23 C    3.142553
24 B    3.355754
25 B    3.262376

Loading dplyr and stringr package and filtering the df1 based on A, C, and D values in Group −

Example

library(dplyr)
library(stringr)
df1%>%filter(str_detect(Group,"A|C|D"))

Output

  Group  Response
1   A   3.040870
2   D   2.691892
3   C   2.814400
4   D   3.270639
5   A   3.197537
6   D   2.671441
7   C   3.233093
8   C   3.257484
9   C   3.142553

Example

 Live Demo

Region<-sample(c("Asia","Oceania","Africa","America"),25,replace=TRUE)
Y<-rpois(25,5)
df2<-data.frame(Region,Y)
df2

Output

   Region   Y
1  Africa   5
2  Oceania  4
3  Oceania  3
4  Oceania  3
5  Oceania  6
6  Oceania  2
7  Oceania  4
8  Oceania  6
9  Asia     1
10 Africa   4
11 Asia     7
12 Asia     10
13 Oceania  1
14 America  5
15 Oceania  3
16 Africa   8
17 Oceania  9
18 Asia     11
19 Africa   7
20 Africa   3
21 Africa   2
22 Asia     5
23 America  6
24 America  2
25 America  1

Filtering the df2 based on Oceania, America, and Africa values in Region −

Example

df2%>%filter(str_detect(Region,"Oceania|America|Africa"))

Output

    Region   Y
1  Africa    5
2  Oceania   4
3  Oceania   3
4  Oceania   3
5  Oceania   6
6  Oceania   2
7  Oceania   4
8  Oceania   6
9  Africa    4
10 Oceania   1
11 America   5
12 Oceania   3
13 Africa    8
14 Oceania   9
15 Africa    7
16 Africa    3
17 Africa    2
18 America   6
19 America   2
20 America   1
Updated on: 2021-02-11T12:02:55+05:30

676 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements