SlideShare a Scribd company logo
Manipulating string data with
a pattern in R
Speaker: CHANG, Lun-Hsien
Affiliation: Genetic Epidemiology, QIMR Berghofer Medical Research Institute
Meeting: R user group meeting #9
Time: 1:10-2:30 PM, 20190828
Place: Level 7, Bancroft building, QIMR, Brisbane, Australia
1
Outline
Download R script from my Google drive:
20190828_R-user-group_string-manipulation.R
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary 2
Manipulating string data is like
hand sewing
3
My string
dataR functions
Patterns
4
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
5
What are special characters?
Special characters are characters with meanings. They get interpreted if not
being escaped.
 ^ $ . | ? * + ( ) [ ] { }
6
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
7
When specifying a pattern in R:
(1) Escape special characters with double
backslashes 
(2) Use OR operators (pipe, |) to chain multiple
patterns
patterns <- "(|factor(|)"
If you want to match the string 1+1=2, the correct syntax is 1+1=2
8
Specifying patterns in R
● ^prefix Looks for string that starts with this prefix
● suffix$ Looks for string that ends with this suffix
● .* Looks for any character at any length (* in Linux)
●  Prevent special characters from being interpreted
● | Match multiple patterns (e.g. pattern 1 or pattern 2 or ….)
begin between end
9
Specifying patterns in R
● ^prefix My target string begins with prefix
● suffix$ My target string ends with suffix
● .* Means any character at any length (* in Linux)
●  Prevent special characters from being interpreted
● | Match pattern 1 or pattern 2 or ….
Is there an AND operator? It is not & nor &&
https://stackoverflow.com/questions/13187414/r-grep-is-there-an-and-operator
begin between end
10
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Subsetting files through their names or paths
● Subsetting groups
Summary
11
What my coefficients look like
linear.model.summary[["coefficients"]]
Estimate Std. Error t value Pr(>|t|)
(Intercept) 46.458333 1.842243 25.2183502 1.921811e-63
factor(race)2 11.541667 3.286129 3.5122376 5.515272e-04
factor(race)3 1.741667 2.732488 0.6373922 5.246133e-01
factor(race)4 7.596839 1.988870 3.8196768 1.792682e-04
12
What I would like my desired output look like
coefficients.dataFrame
Predictor Estimate SE t.value p.value
1 Intercept 46.458333 1.842243 25.2183502 1.921811e-63
2 race2 11.541667 3.286129 3.5122376 5.515272e-04
3 race3 1.741667 2.732488 0.6373922 5.246133e-01
4 race4 7.596839 1.988870 3.8196768 1.792682e-04
Old
13
Replace patterns in the Predictor column with
nothing using `gsub()`
# Remove unwanted string (, factor, ) in a column with
gsub()
patterns <- "(|factor(|)"
temp1 <- coefficients.dataFrame
temp1$Predictor <- gsub( x=temp1$Predictor
,pattern=patterns
,replacement="")
14
Find full code under the heading Scenario 1
Replace patterns in the Predictor column with
nothing using `str_replace_all()`
# Remove unwanted string (, factor, ) in a column with
stringr::str_replace_all
patterns <- "(|factor(|)"
temp2 <- coefficients.dataFrame
temp2$Predictor <- stringr::str_replace_all(string = temp2$Predictor
,pattern=patterns
,replacement="")
15
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Getting files through their names or paths
● Subsetting groups
Summary
16
What my files in a folder look like
17
TSV files that I am interested to import ( .tsv:
tab-separated values)
18
Getting full paths of TSV files with list.files() or
Sys.glob()
# Subset TSV files (positive filtering) with list.files()
patterns <- "harmonised-data.*.tsv$"
tsv.files <- list.files(path=source.files.path
,pattern = patterns
,full.names = TRUE) # length(tsv.files) 220
# Subset TSV files with Sys.glob()
patterns <- "harmonised-data*.tsv"
tsv.files <- Sys.glob(file.path(paste0(source.files.path,"/",patterns))) #
length(tsv.files) 220
Find full code under the heading Scenario 2
19
Patterns in list.files() versus Sys.glob()
patterns <- "harmonised-data.*.tsv$"
list.files(pattern=) reads an optional regular expression
(understandable to R)
patterns <- "harmonised-data*.tsv"
Sys.glob(patterns) expands wildcard (*) on file paths like Unix
20
Getting full paths of non TSV files with grep()
# Subset non tsv files (negative filtering)
patterns <- "harmonised-data.*.tsv$"
non.tsv.files <- grep(x=all.files
,pattern = patterns
,value = TRUE
,invert = TRUE) # length(non.tsv.files)
163
21
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Getting files through their names or paths
● Subsetting groups
Summary
22
Suppose your data are stratified by states, age
groups and sexes, how do you subset groups?
States: NSW, ACT, VIC, QLD, SA, WA, TAS, NT
Age groups: 4-20, 21-40, 41-60, 61+
Sex: males, females, both sexes together
Total number of groups: 96 (8*4*3)
23
Creating all groups with data.table::CJ()
# Create subgroups
group.1 <- c("NSW","ACT","VIC","QLD","SA","WA","TAS","NT") #
length(group.1) 8
group.2 <- paste0("age",c("4-20","21-40","41-60","61+"))
group.3 <- c("males","females","bothSexes")
# Create all combinations from the 3 vectors
## data.table::CJ creates a Join data table
all.groups.subgroups <- data.table::CJ(group.1, group.2, group.3,
sorted = FALSE)[, paste(group.1, group.2, group.3, sep ="_")] #
length(all.groups.subgroups) 96
24
Find full code under the heading Scenario 3
Subsetting males with grep()
# Subset males
males <- grep(x=all.groups.subgroups,pattern = "_males$", value =
TRUE) # length(males) 32
25
Subsetting females aged over 61 from eastern states
# Specify patterns
pattern.1 <- "^NSW|^QLD|^VIC|^ACT|^TAS"
pattern.2 <- "_females$"
pattern.3 <- "61+"
# Subset data from females 61+ in Eastern states
eastern.states.females.61plus <- grep(x=all.groups.subgroups, pattern =
pattern.1, value = TRUE) %>%
grep(., pattern = pattern.2, value=T) %>%
grep(. , pattern=pattern.3, value=T) #
length(eastern.states.females.61plus) 5
26
Outline
What is it like to manipulate string?
What are special characters?
How to specify a pattern?
Scenarios that you will handle string
● Manipulating output from a R object
● Getting files through their names or paths
● Subsetting groups
Summary
27
My string data
R objects
File paths
vectors
R functions
gsub(pattern = )
str_replace_all(pattern = )
list.files(pattern=)
grep(pattern = )
Sys.glob()
Patterns
^
$
.*

|
28
Summary
Removing unwanted string with gsub(), stringr::str_replace_all()
Selecting files with list.files(), Sys.glob() and grep(invert=TRUE)
Subsetting groups with grep()
gsub(pattern = )
str_replace_all(pattern = )
list.files(pattern=)
grep(pattern = )
Sys.glob()
29
Ad

Recommended

PDF
Tackling repetitive tasks with serial or parallel programming in R
Lun-Hsien Chang
 
PPTX
Advance python
pulkit agrawal
 
PPTX
Introduction to Haskell: 2011-04-13
Jay Coskey
 
PPTX
SQL Server Select Topics
Jay Coskey
 
PDF
Communicating State Machines
srirammalhar
 
PDF
Introduction to Functional Programming
Francesco Bruni
 
PPTX
Chapter 5 - THREADING & REGULAR exp - MAULIK BORSANIYA
Maulik Borsaniya
 
PPTX
Python 3.6 Features 20161207
Jay Coskey
 
PDF
C interview-questions-techpreparation
Kushaal Singla
 
PPTX
Python Interview Questions | Python Interview Questions And Answers | Python ...
Simplilearn
 
PPTX
Introduction to the basics of Python programming (part 3)
Pedro Rodrigues
 
PPTX
Session 02 python basics
bodaceacat
 
PPTX
Introduction to Python and TensorFlow
Bayu Aldi Yansyah
 
PPTX
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
PDF
Why we cannot ignore Functional Programming
Mario Fusco
 
PPT
Python
Kumar Gaurav
 
PPTX
Session 05 cleaning and exploring
bodaceacat
 
PDF
Matlab and Python: Basic Operations
Wai Nwe Tun
 
PDF
Haskell for data science
John Cant
 
PPTX
Python advance
Deepak Chandella
 
PPT
9780538745840 ppt ch03
Terry Yoast
 
PPT
Introduction to Python - Part Three
amiable_indian
 
PPTX
Dynamic memory allocation in c++
Tech_MX
 
PDF
Python Workshop. LUG Maniapl
Ankur Shrivastava
 
PDF
High-Performance Haskell
Johan Tibell
 
PPTX
Python programing
hamzagame
 
PDF
Python Basics
tusharpanda88
 
PDF
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
PDF
Transpose and manipulate character strings
Rupak Roy
 

More Related Content

What's hot (20)

PDF
C interview-questions-techpreparation
Kushaal Singla
 
PPTX
Python Interview Questions | Python Interview Questions And Answers | Python ...
Simplilearn
 
PPTX
Introduction to the basics of Python programming (part 3)
Pedro Rodrigues
 
PPTX
Session 02 python basics
bodaceacat
 
PPTX
Introduction to Python and TensorFlow
Bayu Aldi Yansyah
 
PPTX
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
PDF
Why we cannot ignore Functional Programming
Mario Fusco
 
PPT
Python
Kumar Gaurav
 
PPTX
Session 05 cleaning and exploring
bodaceacat
 
PDF
Matlab and Python: Basic Operations
Wai Nwe Tun
 
PDF
Haskell for data science
John Cant
 
PPTX
Python advance
Deepak Chandella
 
PPT
9780538745840 ppt ch03
Terry Yoast
 
PPT
Introduction to Python - Part Three
amiable_indian
 
PPTX
Dynamic memory allocation in c++
Tech_MX
 
PDF
Python Workshop. LUG Maniapl
Ankur Shrivastava
 
PDF
High-Performance Haskell
Johan Tibell
 
PPTX
Python programing
hamzagame
 
PDF
Python Basics
tusharpanda88
 
C interview-questions-techpreparation
Kushaal Singla
 
Python Interview Questions | Python Interview Questions And Answers | Python ...
Simplilearn
 
Introduction to the basics of Python programming (part 3)
Pedro Rodrigues
 
Session 02 python basics
bodaceacat
 
Introduction to Python and TensorFlow
Bayu Aldi Yansyah
 
Introduction to the basics of Python programming (part 1)
Pedro Rodrigues
 
Why we cannot ignore Functional Programming
Mario Fusco
 
Python
Kumar Gaurav
 
Session 05 cleaning and exploring
bodaceacat
 
Matlab and Python: Basic Operations
Wai Nwe Tun
 
Haskell for data science
John Cant
 
Python advance
Deepak Chandella
 
9780538745840 ppt ch03
Terry Yoast
 
Introduction to Python - Part Three
amiable_indian
 
Dynamic memory allocation in c++
Tech_MX
 
Python Workshop. LUG Maniapl
Ankur Shrivastava
 
High-Performance Haskell
Johan Tibell
 
Python programing
hamzagame
 
Python Basics
tusharpanda88
 

Similar to Manipulating string data with a pattern in R (20)

PDF
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
PDF
Transpose and manipulate character strings
Rupak Roy
 
PDF
Eag 201110-hrugregexpresentation-111006104128-phpapp02
egoodwintx
 
PDF
Data Manipulation Using R (& dplyr)
Ram Narasimhan
 
PDF
regex-presentation_ed_goodwin
schamber
 
PPT
R Programming Intro
062MayankSinghal
 
PDF
R code for data manipulation
Avjinder (Avi) Kaler
 
PDF
R code for data manipulation
Avjinder (Avi) Kaler
 
PDF
Rtips123
Mahendra Babu
 
PDF
e_lumley.pdf
betsegaw123
 
PDF
22 spam
Hadley Wickham
 
PDF
Introduction to R programming
Alberto Labarga
 
PPTX
Data processing and visualization basics
Claire Chung
 
PDF
R Introduction
Sangeetha S
 
PPTX
R language introduction
Shashwat Shriparv
 
PPTX
R - Get Started I - Sanaitics
Vijith Nair
 
DOCX
R Language
ShwetDadhaniya1
 
PDF
2014 11-12 sbsm032rstatsprogramming.key
Yannick Wurm
 
PDF
Data Analysis with R (combined slides)
Guy Lebanon
 
PPTX
Data Handling in R language basic concepts.pptx
gameyug28
 
R Programming: Learn To Manipulate Strings In R
Rsquared Academy
 
Transpose and manipulate character strings
Rupak Roy
 
Eag 201110-hrugregexpresentation-111006104128-phpapp02
egoodwintx
 
Data Manipulation Using R (& dplyr)
Ram Narasimhan
 
regex-presentation_ed_goodwin
schamber
 
R Programming Intro
062MayankSinghal
 
R code for data manipulation
Avjinder (Avi) Kaler
 
R code for data manipulation
Avjinder (Avi) Kaler
 
Rtips123
Mahendra Babu
 
e_lumley.pdf
betsegaw123
 
Introduction to R programming
Alberto Labarga
 
Data processing and visualization basics
Claire Chung
 
R Introduction
Sangeetha S
 
R language introduction
Shashwat Shriparv
 
R - Get Started I - Sanaitics
Vijith Nair
 
R Language
ShwetDadhaniya1
 
2014 11-12 sbsm032rstatsprogramming.key
Yannick Wurm
 
Data Analysis with R (combined slides)
Guy Lebanon
 
Data Handling in R language basic concepts.pptx
gameyug28
 
Ad

Recently uploaded (20)

PDF
Residential Zone 4 for industrial village
MdYasinArafat13
 
PPTX
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
PDF
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
PPTX
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
PPTX
Attendance Presentation Project Excel.pptx
s2025266191
 
PDF
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
PDF
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
PDF
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
PPTX
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
PPTX
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
DOCX
The Influence off Flexible Work Policies
sales480687
 
PPSX
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
PPTX
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
PPTX
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
PDF
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
PPTX
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
PPTX
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
PDF
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
PPTX
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
PPTX
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
Residential Zone 4 for industrial village
MdYasinArafat13
 
最新版美国佐治亚大学毕业证(UGA毕业证书)原版定制
Taqyea
 
Shifting Focus on AI: How it Can Make a Positive Difference
1508 A/S
 
Presentation by Tariq & Mohammed (1).pptx
AbooddSandoqaa
 
Attendance Presentation Project Excel.pptx
s2025266191
 
Microsoft Power BI - Advanced Certificate for Business Intelligence using Pow...
Prasenjit Debnath
 
NVIDIA Triton Inference Server, a game-changing platform for deploying AI mod...
Tamanna36
 
624753984-Annex-A3-RPMS-Tool-for-Proficient-Teachers-SY-2024-2025.pdf
CristineGraceAcuyan
 
@Reset-Password.pptx presentakh;kenvtion
MarkLariosa1
 
最新版美国威斯康星大学河城分校毕业证(UWRF毕业证书)原版定制
taqyea
 
The Influence off Flexible Work Policies
sales480687
 
PPT1_CB_VII_CS_Ch3_FunctionsandChartsinCalc.ppsx
animaroy81
 
NASA ESE Study Results v4 05.29.2020.pptx
CiroAlejandroCamacho
 
Communication_Skills_Class10_Visual.pptx
namanrastogi70555
 
lecture12.pdf Introduction to bioinformatics
SergeyTsygankov6
 
美国毕业证范本中华盛顿大学学位证书CWU学生卡购买
Taqyea
 
最新版意大利米兰大学毕业证(UNIMI毕业证书)原版定制
taqyea
 
Informatics Market Insights AI Workforce.pdf
karizaroxx
 
RESEARCH-FINAL-GROUP-3, about the final .pptx
gwapokoha1
 
最新版美国芝加哥大学毕业证(UChicago毕业证书)原版定制
taqyea
 
Ad

Manipulating string data with a pattern in R

  • 1. Manipulating string data with a pattern in R Speaker: CHANG, Lun-Hsien Affiliation: Genetic Epidemiology, QIMR Berghofer Medical Research Institute Meeting: R user group meeting #9 Time: 1:10-2:30 PM, 20190828 Place: Level 7, Bancroft building, QIMR, Brisbane, Australia 1
  • 2. Outline Download R script from my Google drive: 20190828_R-user-group_string-manipulation.R What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Subsetting files through their names or paths ● Subsetting groups Summary 2
  • 3. Manipulating string data is like hand sewing 3
  • 5. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Subsetting files through their names or paths ● Subsetting groups Summary 5
  • 6. What are special characters? Special characters are characters with meanings. They get interpreted if not being escaped. ^ $ . | ? * + ( ) [ ] { } 6
  • 7. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Subsetting files through their names or paths ● Subsetting groups Summary 7
  • 8. When specifying a pattern in R: (1) Escape special characters with double backslashes (2) Use OR operators (pipe, |) to chain multiple patterns patterns <- "(|factor(|)" If you want to match the string 1+1=2, the correct syntax is 1+1=2 8
  • 9. Specifying patterns in R ● ^prefix Looks for string that starts with this prefix ● suffix$ Looks for string that ends with this suffix ● .* Looks for any character at any length (* in Linux) ● Prevent special characters from being interpreted ● | Match multiple patterns (e.g. pattern 1 or pattern 2 or ….) begin between end 9
  • 10. Specifying patterns in R ● ^prefix My target string begins with prefix ● suffix$ My target string ends with suffix ● .* Means any character at any length (* in Linux) ● Prevent special characters from being interpreted ● | Match pattern 1 or pattern 2 or …. Is there an AND operator? It is not & nor && https://stackoverflow.com/questions/13187414/r-grep-is-there-an-and-operator begin between end 10
  • 11. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Subsetting files through their names or paths ● Subsetting groups Summary 11
  • 12. What my coefficients look like linear.model.summary[["coefficients"]] Estimate Std. Error t value Pr(>|t|) (Intercept) 46.458333 1.842243 25.2183502 1.921811e-63 factor(race)2 11.541667 3.286129 3.5122376 5.515272e-04 factor(race)3 1.741667 2.732488 0.6373922 5.246133e-01 factor(race)4 7.596839 1.988870 3.8196768 1.792682e-04 12
  • 13. What I would like my desired output look like coefficients.dataFrame Predictor Estimate SE t.value p.value 1 Intercept 46.458333 1.842243 25.2183502 1.921811e-63 2 race2 11.541667 3.286129 3.5122376 5.515272e-04 3 race3 1.741667 2.732488 0.6373922 5.246133e-01 4 race4 7.596839 1.988870 3.8196768 1.792682e-04 Old 13
  • 14. Replace patterns in the Predictor column with nothing using `gsub()` # Remove unwanted string (, factor, ) in a column with gsub() patterns <- "(|factor(|)" temp1 <- coefficients.dataFrame temp1$Predictor <- gsub( x=temp1$Predictor ,pattern=patterns ,replacement="") 14 Find full code under the heading Scenario 1
  • 15. Replace patterns in the Predictor column with nothing using `str_replace_all()` # Remove unwanted string (, factor, ) in a column with stringr::str_replace_all patterns <- "(|factor(|)" temp2 <- coefficients.dataFrame temp2$Predictor <- stringr::str_replace_all(string = temp2$Predictor ,pattern=patterns ,replacement="") 15
  • 16. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Getting files through their names or paths ● Subsetting groups Summary 16
  • 17. What my files in a folder look like 17
  • 18. TSV files that I am interested to import ( .tsv: tab-separated values) 18
  • 19. Getting full paths of TSV files with list.files() or Sys.glob() # Subset TSV files (positive filtering) with list.files() patterns <- "harmonised-data.*.tsv$" tsv.files <- list.files(path=source.files.path ,pattern = patterns ,full.names = TRUE) # length(tsv.files) 220 # Subset TSV files with Sys.glob() patterns <- "harmonised-data*.tsv" tsv.files <- Sys.glob(file.path(paste0(source.files.path,"/",patterns))) # length(tsv.files) 220 Find full code under the heading Scenario 2 19
  • 20. Patterns in list.files() versus Sys.glob() patterns <- "harmonised-data.*.tsv$" list.files(pattern=) reads an optional regular expression (understandable to R) patterns <- "harmonised-data*.tsv" Sys.glob(patterns) expands wildcard (*) on file paths like Unix 20
  • 21. Getting full paths of non TSV files with grep() # Subset non tsv files (negative filtering) patterns <- "harmonised-data.*.tsv$" non.tsv.files <- grep(x=all.files ,pattern = patterns ,value = TRUE ,invert = TRUE) # length(non.tsv.files) 163 21
  • 22. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Getting files through their names or paths ● Subsetting groups Summary 22
  • 23. Suppose your data are stratified by states, age groups and sexes, how do you subset groups? States: NSW, ACT, VIC, QLD, SA, WA, TAS, NT Age groups: 4-20, 21-40, 41-60, 61+ Sex: males, females, both sexes together Total number of groups: 96 (8*4*3) 23
  • 24. Creating all groups with data.table::CJ() # Create subgroups group.1 <- c("NSW","ACT","VIC","QLD","SA","WA","TAS","NT") # length(group.1) 8 group.2 <- paste0("age",c("4-20","21-40","41-60","61+")) group.3 <- c("males","females","bothSexes") # Create all combinations from the 3 vectors ## data.table::CJ creates a Join data table all.groups.subgroups <- data.table::CJ(group.1, group.2, group.3, sorted = FALSE)[, paste(group.1, group.2, group.3, sep ="_")] # length(all.groups.subgroups) 96 24 Find full code under the heading Scenario 3
  • 25. Subsetting males with grep() # Subset males males <- grep(x=all.groups.subgroups,pattern = "_males$", value = TRUE) # length(males) 32 25
  • 26. Subsetting females aged over 61 from eastern states # Specify patterns pattern.1 <- "^NSW|^QLD|^VIC|^ACT|^TAS" pattern.2 <- "_females$" pattern.3 <- "61+" # Subset data from females 61+ in Eastern states eastern.states.females.61plus <- grep(x=all.groups.subgroups, pattern = pattern.1, value = TRUE) %>% grep(., pattern = pattern.2, value=T) %>% grep(. , pattern=pattern.3, value=T) # length(eastern.states.females.61plus) 5 26
  • 27. Outline What is it like to manipulate string? What are special characters? How to specify a pattern? Scenarios that you will handle string ● Manipulating output from a R object ● Getting files through their names or paths ● Subsetting groups Summary 27
  • 28. My string data R objects File paths vectors R functions gsub(pattern = ) str_replace_all(pattern = ) list.files(pattern=) grep(pattern = ) Sys.glob() Patterns ^ $ .* | 28
  • 29. Summary Removing unwanted string with gsub(), stringr::str_replace_all() Selecting files with list.files(), Sys.glob() and grep(invert=TRUE) Subsetting groups with grep() gsub(pattern = ) str_replace_all(pattern = ) list.files(pattern=) grep(pattern = ) Sys.glob() 29