SlideShare a Scribd company logo
Introduction to Data Mining with R 
and Data Import/Export in R 
Yanchang Zhao 
http://www.RDataMining.com 
30 September 2014 
1 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
2 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
I Have you heard of R? 
2 / 25
Questions 
I Do you know data mining and its algorithms and techniques? 
I Have you heard of R? 
I Have you used R in your research or projects? 
2 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
3 / 25
What is R? 
I R 1 is a free software environment for statistical computing 
and graphics. 
I R can be easily extended with 5,800+ packages available on 
CRAN2 (as of 13 Sept 2014). 
I Many other packages provided on Bioconductor3, R-Forge4, 
GitHub5, etc. 
I R manuals on CRAN6 
I An Introduction to R 
I The R Language De
nition 
I R Data Import/Export 
I . . . 
1http://www.r-project.org/ 
2http://cran.r-project.org/ 
3http://www.bioconductor.org/ 
4http://r-forge.r-project.org/ 
5https://github.com/ 
6http://cran.r-project.org/manuals.html 
4 / 25
Why R? 
I R is widely used in both academia and industry. 
I R was ranked no. 1 in the KDnuggets 2014 poll on Top 
Languages for analytics, data mining, data science7 (actually 
R has been no. 1 in 2011, 2012 & 2013!). 
I The CRAN Task Views 8 provide collections of packages for 
dierent tasks. 
I Machine learning  atatistical learning 
I Cluster analysis
nite mixture models 
I Time series analysis 
I Multivariate statistics 
I Analysis of spatial data 
I . . . 
7 
http://www.kdnuggets.com/polls/2014/languages-analytics-data-mining-data-science.html 
8 
http://cran.r-project.org/web/views/ 
5 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
6 / 25
Classi
cation with R 
I Decision trees: rpart, party 
I Random forest: randomForest, party 
I SVM: e1071, kernlab 
I Neural networks: nnet, neuralnet, RSNNS 
I Performance evaluation: ROCR 
7 / 25
Clustering with R 
I k-means: kmeans(), kmeansruns()9 
I k-medoids: pam(), pamk() 
I Hierarchical clustering: hclust(), agnes(), diana() 
I DBSCAN: fpc 
I BIRCH: birch 
9Functions are followed with (), and others are packages. 
8 / 25
Association Rule Mining with R 
I Association rules: apriori(), eclat() in package arules 
I Sequential patterns: arulesSequence 
I Visualisation of associations: arulesViz 
9 / 25
Text Mining with R 
I Text mining: tm 
I Topic modelling: topicmodels, lda 
I Word cloud: wordcloud 
I Twitter data access: twitteR 
10 / 25
Time Series Analysis with R 
I Time series decomposition: decomp(), decompose(), arima(), 
stl() 
I Time series forecasting: forecast 
I Time Series Clustering: TSclust 
I Dynamic Time Warping (DTW): dtw 
11 / 25
Social Network Analysis with R 
I Packages: igraph, sna 
I Centrality measures: degree(), betweenness(), closeness(), 
transitivity() 
I Clusters: clusters(), no.clusters() 
I Cliques: cliques(), largest.cliques(), maximal.cliques(), 
clique.number() 
I Community detection: fastgreedy.community(), 
spinglass.community() 
12 / 25
R and Big Data 
I Hadoop 
I Hadoop (or YARN) - a framework that allows for the 
distributed processing of large data sets across clusters of 
computers using simple programming models 
I R Packages: RHadoop, RHIPE 
I Spark 
I Spark - a fast and general engine for large-scale data 
processing, which can be 100 times faster than Hadoop 
I SparkR - R frontend for Spark 
I H2O 
I H2O - an open source in-memory prediction engine for big 
data science 
I R Package: h2o 
I MongoDB 
I MongoDB - an open-source document database 
I R packages: rmongodb, RMongo 
13 / 25
R and Hadoop 
I Packages: RHadoop, RHive 
I RHadoop10 is a collection of R packages: 
I rmr2 - perform data analysis with R via MapReduce on a 
Hadoop cluster 
I rhdfs - connect to Hadoop Distributed File System (HDFS) 
I rhbase - connect to the NoSQL HBase database 
I . . . 
I You can play with it on a single PC (in standalone or 
pseudo-distributed mode), and your code developed on that 
will be able to work on a cluster of PCs (in full-distributed 
mode)! 
I Step-by-Step Guide to Setting Up an R-Hadoop System 
http://www.rdatamining.com/big-data/ 
r-hadoop-setup-guide 
10https://github.com/RevolutionAnalytics/RHadoop/wiki 
14 / 25
Outline 
Introduction to R 
R Packages and Functions for Data Mining 
Data Import and Export 
Online Resources 
15 / 25
Data Import and Export 11 
Read data from and write data to 
I R native formats (incl. Rdata and RDS) 
I CSV
les 
I EXCEL
les 
I ODBC databases 
I SAS databases 
R Data Import/Export: 
I http://cran.r-project.org/doc/manuals/R-data.pdf 
11Chapter 2: Data Import and Export, in book R and Data Mining: Examples 
and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 
16 / 25
Save and Load R Objects 
I save(): save R objects into a .Rdata
le 
I load(): read R objects from a .Rdata
le 
I rm(): remove objects from R 
a - 1:10 
save(a, file = ./data/dumData.Rdata) 
rm(a) 
a 
## Error: object 'a' not found 
load(./data/dumData.Rdata) 
a 
## [1] 1 2 3 4 5 6 7 8 9 10 
17 / 25
Save and Load R Objects - More Functions 
I save.image(): 
save current workspace to a
le 
It saves everything! 
I readRDS(): 
read a single R object from a .rds
le 
I saveRDS(): 
save a single R object to a

More Related Content

What's hot (20)

PDF
Introduction to R
Samuel Bosch
 
PDF
R - the language
Mike Martinez
 
PPTX
Introduction To R Language
Gaurang Dobariya
 
PDF
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
Zurich_R_User_Group
 
ODP
Introduction to the language R
fbenault
 
PPTX
R language
LearningTech
 
KEY
Presentation R basic teaching module
Sander Timmer
 
PPTX
Programming in R
Smruti Sarangi
 
PDF
RDataMining slides-r-programming
Yanchang Zhao
 
PPTX
R Programming Tutorial for Beginners - -TIB Academy
rajkamaltibacademy
 
PDF
Machine Learning in R
Alexandros Karatzoglou
 
PDF
Next Generation Programming in R
Florian Uhlitz
 
PPTX
R Language Introduction
Khaled Al-Shamaa
 
PDF
R programming & Machine Learning
AmanBhalla14
 
PPTX
R programming Fundamentals
Ragia Ibrahim
 
PDF
Introduction to data analysis using R
Victoria López
 
PPTX
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
PPTX
R programming language
Alberto Minetti
 
PDF
2 R Tutorial Programming
Sakthi Dasans
 
PDF
R basics
FAO
 
Introduction to R
Samuel Bosch
 
R - the language
Mike Martinez
 
Introduction To R Language
Gaurang Dobariya
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
Zurich_R_User_Group
 
Introduction to the language R
fbenault
 
R language
LearningTech
 
Presentation R basic teaching module
Sander Timmer
 
Programming in R
Smruti Sarangi
 
RDataMining slides-r-programming
Yanchang Zhao
 
R Programming Tutorial for Beginners - -TIB Academy
rajkamaltibacademy
 
Machine Learning in R
Alexandros Karatzoglou
 
Next Generation Programming in R
Florian Uhlitz
 
R Language Introduction
Khaled Al-Shamaa
 
R programming & Machine Learning
AmanBhalla14
 
R programming Fundamentals
Ragia Ibrahim
 
Introduction to data analysis using R
Victoria López
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Serban Tanasa
 
R programming language
Alberto Minetti
 
2 R Tutorial Programming
Sakthi Dasans
 
R basics
FAO
 

Viewers also liked (20)

PDF
An Introduction to Data Mining with R
Yanchang Zhao
 
PDF
R Reference Card for Data Mining
Yanchang Zhao
 
PDF
Regression and Classification with R
Yanchang Zhao
 
PDF
Text Mining with R -- an Analysis of Twitter Data
Yanchang Zhao
 
PDF
Data Exploration and Visualization with R
Yanchang Zhao
 
PDF
Introduction to R for Data Mining (Feb 2013)
Revolution Analytics
 
KEY
R by example: mining Twitter for consumer attitudes towards airlines
Jeffrey Breen
 
PDF
Association Rule Mining with R
Yanchang Zhao
 
PDF
Time series-mining-slides
Yanchang Zhao
 
PDF
Time Series Analysis and Mining with R
Yanchang Zhao
 
PDF
Data Clustering with R
Yanchang Zhao
 
DOCX
TiffanyHertel2016RESUME_Final
Tiffany Hertel
 
PDF
Data mining platform
chanson zhang
 
PDF
Analyzing mlb data with ggplot
Austin Ogilvie
 
PPTX
Analyze this
Ajay Ohri
 
PDF
Table of Useful R commands.
Dr. Volkan OBAN
 
PDF
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Austin Ogilvie
 
PDF
Using R for Social Media and Sports Analytics
Ajay Ohri
 
PDF
Python at yhat (august 2013)
Austin Ogilvie
 
PDF
Ggplot in python
Ajay Ohri
 
An Introduction to Data Mining with R
Yanchang Zhao
 
R Reference Card for Data Mining
Yanchang Zhao
 
Regression and Classification with R
Yanchang Zhao
 
Text Mining with R -- an Analysis of Twitter Data
Yanchang Zhao
 
Data Exploration and Visualization with R
Yanchang Zhao
 
Introduction to R for Data Mining (Feb 2013)
Revolution Analytics
 
R by example: mining Twitter for consumer attitudes towards airlines
Jeffrey Breen
 
Association Rule Mining with R
Yanchang Zhao
 
Time series-mining-slides
Yanchang Zhao
 
Time Series Analysis and Mining with R
Yanchang Zhao
 
Data Clustering with R
Yanchang Zhao
 
TiffanyHertel2016RESUME_Final
Tiffany Hertel
 
Data mining platform
chanson zhang
 
Analyzing mlb data with ggplot
Austin Ogilvie
 
Analyze this
Ajay Ohri
 
Table of Useful R commands.
Dr. Volkan OBAN
 
Building a Beer Recommender with Yhat (PAPIs.io - November 2014)
Austin Ogilvie
 
Using R for Social Media and Sports Analytics
Ajay Ohri
 
Python at yhat (august 2013)
Austin Ogilvie
 
Ggplot in python
Ajay Ohri
 
Ad

Similar to Introduction to Data Mining with R and Data Import/Export in R (20)

PDF
Data analystics with R module 3 cseds vtu
LalithauLali
 
PPTX
R training at Aimia
Ali Arsalan Kazmi
 
PDF
SQLBits Module 2 RStats Introduction to R and Statistics
Jen Stirrup
 
PPTX
Big data analytics with R tool.pptx
salutiontechnology
 
PDF
Open source analytics
Ajay Ohri
 
PDF
R tutorial
Richard Vidgen
 
PDF
R the unsung hero of Big Data
Dhafer Malouche
 
PPT
Basics of R-Progranmming with instata.ppt
geethar79
 
PPT
17641.ppt
AhmedAbdalla903058
 
PPT
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
PPT
17641.ppt
vikassingh569137
 
PPT
How to obtain and install R.ppt
rajalakshmi5921
 
PPTX
Data Analytics with R and SQL Server
Stéphane Fréchette
 
PPT
An introduction to R is a document useful
ssuser3c3f88
 
PPT
Introduction to R for Data Science Technology
gufranqureshi506
 
PDF
An R primer for SQL folks
Thomas Hütter
 
PPTX
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
PPTX
Introduction to basic statistics
IBM
 
PDF
Intro to R and Data Mining 2012 09 27
Raj Kasarabada
 
PDF
Using R For Data Management Statistical Analysis And Graphics 1st Edition Nic...
simpikimal
 
Data analystics with R module 3 cseds vtu
LalithauLali
 
R training at Aimia
Ali Arsalan Kazmi
 
SQLBits Module 2 RStats Introduction to R and Statistics
Jen Stirrup
 
Big data analytics with R tool.pptx
salutiontechnology
 
Open source analytics
Ajay Ohri
 
R tutorial
Richard Vidgen
 
R the unsung hero of Big Data
Dhafer Malouche
 
Basics of R-Progranmming with instata.ppt
geethar79
 
Slides on introduction to R by ArinBasu MD
SonaCharles2
 
17641.ppt
vikassingh569137
 
How to obtain and install R.ppt
rajalakshmi5921
 
Data Analytics with R and SQL Server
Stéphane Fréchette
 
An introduction to R is a document useful
ssuser3c3f88
 
Introduction to R for Data Science Technology
gufranqureshi506
 
An R primer for SQL folks
Thomas Hütter
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
Introduction to basic statistics
IBM
 
Intro to R and Data Mining 2012 09 27
Raj Kasarabada
 
Using R For Data Management Statistical Analysis And Graphics 1st Edition Nic...
simpikimal
 
Ad

More from Yanchang Zhao (8)

PDF
RDataMining slides-time-series-analysis
Yanchang Zhao
 
PDF
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
PDF
RDataMining slides-regression-classification
Yanchang Zhao
 
PDF
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
PDF
RDataMining slides-data-exploration-visualisation
Yanchang Zhao
 
PDF
RDataMining slides-clustering-with-r
Yanchang Zhao
 
PDF
RDataMining slides-association-rule-mining-with-r
Yanchang Zhao
 
PDF
RDataMining-reference-card
Yanchang Zhao
 
RDataMining slides-time-series-analysis
Yanchang Zhao
 
RDataMining slides-text-mining-with-r
Yanchang Zhao
 
RDataMining slides-regression-classification
Yanchang Zhao
 
RDataMining slides-network-analysis-with-r
Yanchang Zhao
 
RDataMining slides-data-exploration-visualisation
Yanchang Zhao
 
RDataMining slides-clustering-with-r
Yanchang Zhao
 
RDataMining slides-association-rule-mining-with-r
Yanchang Zhao
 
RDataMining-reference-card
Yanchang Zhao
 

Recently uploaded (20)

PDF
Building scalbale cloud native apps with .NET 8
GillesMathieu10
 
PDF
Which Hiring Management Tools Offer the Best ROI?
HireME
 
PDF
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
PDF
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
DOCX
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
PDF
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
PDF
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
 
PPTX
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
PDF
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
PPTX
declaration of Variables and constants.pptx
meemee7378
 
PDF
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
 
PPTX
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
 
PPTX
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
PDF
Automated Test Case Repair Using Language Models
Lionel Briand
 
PDF
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
 
PPTX
For my supp to finally picking supp that work
necas19388
 
PDF
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
PDF
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
PDF
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
 
PDF
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 
Building scalbale cloud native apps with .NET 8
GillesMathieu10
 
Which Hiring Management Tools Offer the Best ROI?
HireME
 
What Is an Internal Quality Audit and Why It Matters for Your QMS
BizPortals365
 
Why Edge Computing Matters in Mobile Application Tech.pdf
IMG Global Infotech
 
Zoho Creator Solution for EI by Elsner Technologies.docx
Elsner Technologies Pvt. Ltd.
 
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
Telemedicine App Development_ Key Factors to Consider for Your Healthcare Ven...
Mobilityinfotech
 
IObit Driver Booster Pro 12 Crack Latest Version Download
pcprocore
 
CodeCleaner: Mitigating Data Contamination for LLM Benchmarking
arabelatso
 
declaration of Variables and constants.pptx
meemee7378
 
Designing Accessible Content Blocks (1).pdf
jaclynmennie1
 
Avast Premium Security crack 25.5.6162 + License Key 2025
HyperPc soft
 
Iobit Driver Booster Pro 12 Crack Free Download
chaudhryakashoo065
 
Automated Test Case Repair Using Language Models
Lionel Briand
 
Writing Maintainable Playwright Tests with Ease
Shubham Joshi
 
For my supp to finally picking supp that work
necas19388
 
Code Once; Run Everywhere - A Beginner’s Journey with React Native
Hasitha Walpola
 
Mastering VPC Architecture Build for Scale from Day 1.pdf
Devseccops.ai
 
AWS Consulting Services: Empowering Digital Transformation with Nlineaxis
Nlineaxis IT Solutions Pvt Ltd
 
Humans vs AI Call Agents - Qcall.ai's Special Report
Udit Goenka
 

Introduction to Data Mining with R and Data Import/Export in R

  • 1. Introduction to Data Mining with R and Data Import/Export in R Yanchang Zhao http://www.RDataMining.com 30 September 2014 1 / 25
  • 2. Questions I Do you know data mining and its algorithms and techniques? 2 / 25
  • 3. Questions I Do you know data mining and its algorithms and techniques? I Have you heard of R? 2 / 25
  • 4. Questions I Do you know data mining and its algorithms and techniques? I Have you heard of R? I Have you used R in your research or projects? 2 / 25
  • 5. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 3 / 25
  • 6. What is R? I R 1 is a free software environment for statistical computing and graphics. I R can be easily extended with 5,800+ packages available on CRAN2 (as of 13 Sept 2014). I Many other packages provided on Bioconductor3, R-Forge4, GitHub5, etc. I R manuals on CRAN6 I An Introduction to R I The R Language De
  • 7. nition I R Data Import/Export I . . . 1http://www.r-project.org/ 2http://cran.r-project.org/ 3http://www.bioconductor.org/ 4http://r-forge.r-project.org/ 5https://github.com/ 6http://cran.r-project.org/manuals.html 4 / 25
  • 8. Why R? I R is widely used in both academia and industry. I R was ranked no. 1 in the KDnuggets 2014 poll on Top Languages for analytics, data mining, data science7 (actually R has been no. 1 in 2011, 2012 & 2013!). I The CRAN Task Views 8 provide collections of packages for dierent tasks. I Machine learning atatistical learning I Cluster analysis
  • 9. nite mixture models I Time series analysis I Multivariate statistics I Analysis of spatial data I . . . 7 http://www.kdnuggets.com/polls/2014/languages-analytics-data-mining-data-science.html 8 http://cran.r-project.org/web/views/ 5 / 25
  • 10. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 6 / 25
  • 12. cation with R I Decision trees: rpart, party I Random forest: randomForest, party I SVM: e1071, kernlab I Neural networks: nnet, neuralnet, RSNNS I Performance evaluation: ROCR 7 / 25
  • 13. Clustering with R I k-means: kmeans(), kmeansruns()9 I k-medoids: pam(), pamk() I Hierarchical clustering: hclust(), agnes(), diana() I DBSCAN: fpc I BIRCH: birch 9Functions are followed with (), and others are packages. 8 / 25
  • 14. Association Rule Mining with R I Association rules: apriori(), eclat() in package arules I Sequential patterns: arulesSequence I Visualisation of associations: arulesViz 9 / 25
  • 15. Text Mining with R I Text mining: tm I Topic modelling: topicmodels, lda I Word cloud: wordcloud I Twitter data access: twitteR 10 / 25
  • 16. Time Series Analysis with R I Time series decomposition: decomp(), decompose(), arima(), stl() I Time series forecasting: forecast I Time Series Clustering: TSclust I Dynamic Time Warping (DTW): dtw 11 / 25
  • 17. Social Network Analysis with R I Packages: igraph, sna I Centrality measures: degree(), betweenness(), closeness(), transitivity() I Clusters: clusters(), no.clusters() I Cliques: cliques(), largest.cliques(), maximal.cliques(), clique.number() I Community detection: fastgreedy.community(), spinglass.community() 12 / 25
  • 18. R and Big Data I Hadoop I Hadoop (or YARN) - a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models I R Packages: RHadoop, RHIPE I Spark I Spark - a fast and general engine for large-scale data processing, which can be 100 times faster than Hadoop I SparkR - R frontend for Spark I H2O I H2O - an open source in-memory prediction engine for big data science I R Package: h2o I MongoDB I MongoDB - an open-source document database I R packages: rmongodb, RMongo 13 / 25
  • 19. R and Hadoop I Packages: RHadoop, RHive I RHadoop10 is a collection of R packages: I rmr2 - perform data analysis with R via MapReduce on a Hadoop cluster I rhdfs - connect to Hadoop Distributed File System (HDFS) I rhbase - connect to the NoSQL HBase database I . . . I You can play with it on a single PC (in standalone or pseudo-distributed mode), and your code developed on that will be able to work on a cluster of PCs (in full-distributed mode)! I Step-by-Step Guide to Setting Up an R-Hadoop System http://www.rdatamining.com/big-data/ r-hadoop-setup-guide 10https://github.com/RevolutionAnalytics/RHadoop/wiki 14 / 25
  • 20. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 15 / 25
  • 21. Data Import and Export 11 Read data from and write data to I R native formats (incl. Rdata and RDS) I CSV
  • 23. les I ODBC databases I SAS databases R Data Import/Export: I http://cran.r-project.org/doc/manuals/R-data.pdf 11Chapter 2: Data Import and Export, in book R and Data Mining: Examples and Case Studies. http://www.rdatamining.com/docs/RDataMining.pdf 16 / 25
  • 24. Save and Load R Objects I save(): save R objects into a .Rdata
  • 25. le I load(): read R objects from a .Rdata
  • 26. le I rm(): remove objects from R a - 1:10 save(a, file = ./data/dumData.Rdata) rm(a) a ## Error: object 'a' not found load(./data/dumData.Rdata) a ## [1] 1 2 3 4 5 6 7 8 9 10 17 / 25
  • 27. Save and Load R Objects - More Functions I save.image(): save current workspace to a
  • 28. le It saves everything! I readRDS(): read a single R object from a .rds
  • 29. le I saveRDS(): save a single R object to a
  • 30. le I Advantage of readRDS() and saveRDS(): You can restore the data under a dierent object name. I Advantage of load() and save(): You can save multiple R objects to one
  • 31. le. 18 / 25
  • 32. Import from and Export to .CSV Files I write.csv(): write an R object to a .CSV
  • 33. le I read.csv(): read an R object from a .CSV
  • 34. le # create a data frame var1 - 1:5 var2 - (1:5)/10 var3 - c(R, and, Data Mining, Examples, Case Studies) df1 - data.frame(var1, var2, var3) names(df1) - c(VarInt, VarReal, VarChar) # save to a csv file write.csv(df1, ./data/dummmyData.csv, row.names = FALSE) # read from a csv file df2 - read.csv(./data/dummmyData.csv) print(df2) ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 19 / 25
  • 35. Import from and Export to EXCEL Files Package xlsx: read, write, format Excel 2007 and Excel 97/2000/XP/2003
  • 36. les library(xlsx) xlsx.file - ./data/dummmyData.xlsx write.xlsx(df2, xlsx.file, sheetName = sheet1, row.names = F) df3 - read.xlsx(xlsx.file, sheetName = sheet1) df3 ## VarInt VarReal VarChar ## 1 1 0.1 R ## 2 2 0.2 and ## 3 3 0.3 Data Mining ## 4 4 0.4 Examples ## 5 5 0.5 Case Studies 20 / 25
  • 37. Read from Databases I Package RODBC: provides connection to ODBC databases. I Function odbcConnect(): sets up a connection to database I sqlQuery(): sends an SQL query to the database I odbcClose() closes the connection. library(RODBC) db - odbcConnect(dsn = servername, uid = userid, pwd = ******) sql - SELECT * FROM lib.table WHERE ... # or read query from file sql - readChar(myQuery.sql, nchars=99999) myData - sqlQuery(db, sql, errors=TRUE) odbcClose(db) 21 / 25
  • 38. Read from Databases I Package RODBC: provides connection to ODBC databases. I Function odbcConnect(): sets up a connection to database I sqlQuery(): sends an SQL query to the database I odbcClose() closes the connection. library(RODBC) db - odbcConnect(dsn = servername, uid = userid, pwd = ******) sql - SELECT * FROM lib.table WHERE ... # or read query from file sql - readChar(myQuery.sql, nchars=99999) myData - sqlQuery(db, sql, errors=TRUE) odbcClose(db) Functions sqlFetch(), sqlSave() and sqlUpdate(): read, write or update a table in an ODBC database 21 / 25
  • 39. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat
  • 40. les) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome - C:/Program Files/SAS/SASFoundation/9.2 filepath - ./data # filename should be no more than 8 characters, without extension fileName - dumData # read data from a SAS dataset a - read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, sas.exe)) 22 / 25
  • 41. Import Data from SAS Package foreign provides function read.ssd() for importing SAS datasets (.sas7bdat
  • 42. les) into R. library(foreign) # for importing SAS data # the path of SAS on your computer sashome - C:/Program Files/SAS/SASFoundation/9.2 filepath - ./data # filename should be no more than 8 characters, without extension fileName - dumData # read data from a SAS dataset a - read.ssd(file.path(filepath), fileName, sascmd=file.path(sashome, sas.exe)) Another way: using function read.xport() to read a
  • 43. le in SAS Transport (XPORT) format 22 / 25
  • 44. Outline Introduction to R R Packages and Functions for Data Mining Data Import and Export Online Resources 23 / 25
  • 45. Online Resources I RDataMining website http://www.rdatamining.com I R Reference Card for Data Mining I R and Data Mining: Examples and Case Studies I RDataMining Group on LinkedIn (7,000+ members) http://group.rdatamining.com I RDataMining on Twitter (1,700+ followers) @RDataMining I Free online courses http://www.rdatamining.com/resources/courses I Online documents http://www.rdatamining.com/resources/onlinedocs 24 / 25
  • 46. The End Thanks! Email: yanchang(at)rdatamining.com 25 / 25