SlideShare a Scribd company logo
Data Analytics with R and SQL Server
Stéphane Fréchette
Thursday March 19, 2015
Who am I?
My name is Stéphane Fréchette
SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data
|NoSQL | Data Science. Drums, good food and fine wine.
I have a passion for architecting, designing and building solutions that
matter.
Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com
Topics
• What is R?
• Should I use R?
• Data Structures
• Graphics
• Data Manipulation in R
• Connecting to SQL Server
• Demos
• Resources
• Q&A
DISCLAIMER
This is not a course nor a tutorial, but
an introduction, a walkthrough to
inspire you to further explore and
learn more about R and statistical computing
“ Analysis of data is a process of inspecting, cleaning,
transforming, and modeling data with the goal of
discovering useful information, suggesting conclusions,
and supporting decision-making. Data analysis has
multiple facets and approaches, encompassing diverse
techniques under a variety of names, in different business,
science, and social science domains.”
- Wikipedia
What is R?
• A programming language, environment for statistical computing and graphics
• R has its origins in the S programming language created in the 1970’s
• Best used to manipulate moderately sized datasets, do statistical analysis and
produce data-centric documents and presentations
• These tools are distributed as packages, which any user can download to
customize the R environment
• Cross-platform: runs on Mac, Windows and Unix based systems
Should I use R?
Are you
doing
statistics
?
No Yes
No Yes
Where “statistics” can mean machine learning, predictive analytics, data
science, anything that falls under a rather broad umbrella…
But if you have some data that makes sense to represent in a tabular like
structure, and you want to do some cool analytical or statistics stuff with it, R is
definitely a good choice…
Downloading and Installing R
http://www.r-project.org/ http://www.rstudio.com/
The IDE (RStudio)
1. View Files and Data
2. See Workspace and
History
3. See Files, Plots,
Packages and Help
4. Console
1 2
34
Installing Packages
• To use packages in R, one must first install them using the install.packages
function
• Downloads the packages from CRAN and installs it to ready to be use
Loading Packages
• To use particular packages in your current R session, one must load it into the
R environment using the library or require functions
Common Data Structures in R
To make the best of the R language, one needs a strong understanding of the
basic data types and data structures and how to operate and use them.
R has a wide variety of data types including scalars, vectors (numerical,
character, logical), matrices, data frames, and lists…
To understand computations in R, two slogans are helpful:
• Everything that exists is an object
• Everything that happens is a function call
John Chambers
creator of the S programming language, and core member of the R programming language project.
Data Structures - Vectors
The simplest structure is the numeric vector, which is a single entity consisting of an ordered
collection of numbers.
Data Structures - Matrices
Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function
matrix.
Data Structures - Data frames
Time series are often ordered in data frames. A data frame is a matrix with names above the
columns. This is nice, because you can call and use one of the columns without knowing in
which position it is.
Data Structures - Lists
An R list is an object consisting of an ordered collection of objects known as its components.
Data Structures - Date and Time
Sys.time() # returns the current system date time
Data Structures - Date and Time
Two main (internal) formats for date-time are: POSIXct and POSIXlt
• POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame
• POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
Data Structures - Others
Other useful and important data type
• NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero.
The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not.
• NA: Used for denoting missing values. (x = NA) creates a variable x with missing values.
The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not.
• NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function
is.nan() lets you check whether the value of a variable is NaN or not.
• Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The
function is.finite() lets you check whether the value of a variable is infinity or not.
Graphics
One of the main reasons data analysts and data
scientists turn to R is for its strong graphic
capabilities.
Basic Graphs:
• These include density plots (histograms and kernel
density plots), dot plots, bar charts (simple,
stacked, grouped), line charts, pie charts (simple,
annotated, 3D), boxplots (simple, notched, violin
plots, bagplots) and scatter plots (simple, with fit
lines, scatterplot matrices, high density plots, and
3D plots).
Graphics
Advances Graphs:
• Graphical parameters describes how to change a
graph's symbols, fonts, colors, and lines. Axes and
text describe how to customize a graph's axes, add
reference lines, text annotations and a legend.
Combining plots describes how to organize
multiple plots into a single graph.
• The lattice package provides a comprehensive
system for visualizing multivariate data, including
the ability to create plots conditioned on one or
more variables. The ggplot2 package offers a
elegant systems for generating univariate and
multivariate graphs based on a grammar of
graphics.
Data Manipulation in R
dplyr an R package for fast and easy data manipulation.
Data manipulation often involves common tasks, such as selecting certain variables, filtering
on certain conditions, deriving new variables from existing variables, and so forth. If we
think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation.
In dplyr the main verbs (or functions) are:
• filter: select a subset of the rows of a data frame
• arrange: works similarly to filter, except that instead of filtering or selecting rows, it
reorders them
• select: select columns of a data frame
• mutate: add new columns to a data frame that are functions of existing columns
• summarize: summarize values
• group_by: describe how to break a data frame into groups of rows
Demo
[dplyr – manipulating data]
Connecting R and SQL Server
The RODBC package provides access to databases (including Microsoft Access
and Microsoft SQL Server) through an ODBC interface
Function Description
odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database
sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame
sqlQuery(channel, query) Submit a query to an ODBC database and return the
results
sqlSave(channel, mydf, tablename = sqtable, append
= FALSE)
Write or update (append=TRUE) a data frame to a
table in the ODBC database
sqlDrop(channel, sqtable) Remove a table from the ODBC database
close(channel) Close the connection
RODBC Example
Other interface
The RJDBC package provides access to databases through a JDBC interface.
(requires JDBC driver from Microsoft)
Demo
[Let’s analyze - R and SQL Server]
Resources
• The R Project for Statistical Computing http://www.r-project.org/
• RStudio http://www.rstudio.com/
• Revolution Analytics http://www.revolutionanalytics.com/
• Shiny http://shiny.rstudio.com/
• {swirl} Learn R, in R http://swirlstats.com/
• R-bloggers http://www.r-bloggers.com/
• Online R resources for Beginners http://bit.ly/1x2q6Gl
• 60+ R resources to improve your data skills http://bit.ly/1BzW4ox
• Stack Overflow - R http://stackoverflow.com/tags/r
• Cerebral Mastication - R Resources http://bit.ly/17YhZj4
• Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
What Questions Do You Have?
Thank You
For attending this session
Ad

Recommended

PPTX
Big data Analytics Hadoop
Mishika Bharadwaj
 
PPTX
Seaborn.pptx
TheMusicFever
 
PDF
Python for Data Science | Python Data Science Tutorial | Data Science Certifi...
Edureka!
 
PDF
Introduction to Pandas and Time Series Analysis [PyCon DE]
Alexander Hendorf
 
PPT
Input devices.ppt
Rahul Borate
 
PPT
Introduction to hadoop
karthika karthi
 
PDF
JAVA NIO
오석 한
 
PPTX
Python for Big Data Analytics
Edureka!
 
PDF
pandas: Powerful data analysis tools for Python
Wes McKinney
 
PPTX
Basic of python for data analysis
Pramod Toraskar
 
PPTX
Azure Data Engineer Certification | How to Become Azure Data Engineer
Intellipaat
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
PPTX
Class, object and inheritance in python
Santosh Verma
 
PDF
Social Impacts & Trends of Data Mining
SushilDhakal4
 
PPTX
Moving object databases
Shivangi Gupta
 
PPTX
What is SQL Server?
CPD INDIA
 
PPTX
File handling in Python
Megha V
 
PDF
Intoduction to numpy
Faraz Ahmed
 
PPTX
SQL - Structured query language introduction
Smriti Jain
 
PPTX
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
PPTX
Delta Lake with Azure Databricks
Dustin Vannoy
 
PPTX
Introduction to Python Programing
sameer patil
 
PPTX
Managing input and output operations in c
niyamathShariff
 
PDF
Power BI vs Tableau vs Cognos: A Data Analytics Research
Luciano Vilas Boas
 
PPTX
A Workshop on R
Ajay Ohri
 
PPTX
R and Data Science
Revolution Analytics
 

More Related Content

What's hot (20)

PPTX
Python for Big Data Analytics
Edureka!
 
PDF
pandas: Powerful data analysis tools for Python
Wes McKinney
 
PPTX
Basic of python for data analysis
Pramod Toraskar
 
PPTX
Azure Data Engineer Certification | How to Become Azure Data Engineer
Intellipaat
 
PPTX
Delta lake and the delta architecture
Adam Doyle
 
PPTX
Python Seaborn Data Visualization
Sourabh Sahu
 
PDF
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
PPTX
Class, object and inheritance in python
Santosh Verma
 
PDF
Social Impacts & Trends of Data Mining
SushilDhakal4
 
PPTX
Moving object databases
Shivangi Gupta
 
PPTX
What is SQL Server?
CPD INDIA
 
PPTX
File handling in Python
Megha V
 
PDF
Intoduction to numpy
Faraz Ahmed
 
PPTX
SQL - Structured query language introduction
Smriti Jain
 
PPTX
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
PPTX
Delta Lake with Azure Databricks
Dustin Vannoy
 
PPTX
Introduction to Python Programing
sameer patil
 
PPTX
Managing input and output operations in c
niyamathShariff
 
PDF
Power BI vs Tableau vs Cognos: A Data Analytics Research
Luciano Vilas Boas
 
Python for Big Data Analytics
Edureka!
 
pandas: Powerful data analysis tools for Python
Wes McKinney
 
Basic of python for data analysis
Pramod Toraskar
 
Azure Data Engineer Certification | How to Become Azure Data Engineer
Intellipaat
 
Delta lake and the delta architecture
Adam Doyle
 
Python Seaborn Data Visualization
Sourabh Sahu
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks
 
Class, object and inheritance in python
Santosh Verma
 
Social Impacts & Trends of Data Mining
SushilDhakal4
 
Moving object databases
Shivangi Gupta
 
What is SQL Server?
CPD INDIA
 
File handling in Python
Megha V
 
Intoduction to numpy
Faraz Ahmed
 
SQL - Structured query language introduction
Smriti Jain
 
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
Delta Lake with Azure Databricks
Dustin Vannoy
 
Introduction to Python Programing
sameer patil
 
Managing input and output operations in c
niyamathShariff
 
Power BI vs Tableau vs Cognos: A Data Analytics Research
Luciano Vilas Boas
 

Viewers also liked (6)

PPTX
A Workshop on R
Ajay Ohri
 
PPTX
R and Data Science
Revolution Analytics
 
PPTX
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
PDF
Introduction to Data Analytics with R
Wei Zhong Toh
 
PPTX
Tata consultancy services final
Wasim Akram
 
A Workshop on R
Ajay Ohri
 
R and Data Science
Revolution Analytics
 
Training in Analytics, R and Social Media Analytics
Ajay Ohri
 
Introduction to Data Analytics with R
Wei Zhong Toh
 
Tata consultancy services final
Wasim Akram
 
Ad

Similar to Data Analytics with R and SQL Server (20)

PPTX
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
PPTX
R training at Aimia
Ali Arsalan Kazmi
 
PDF
Introduction+to+R.pdf
MudasserAziz2
 
PPTX
DATA MINING USING R (1).pptx
myworld93
 
PPTX
Introduction to R programming Language.pptx
kemetex
 
PPT
Basics of R-Programming with example.ppt
geethar79
 
PPT
Basocs of statistics with R-Programming.ppt
geethar79
 
PPT
R-Programming.ppt it is based on R programming language
Zoha681526
 
PPT
R programming by ganesh kavhar
Savitribai Phule Pune University
 
PDF
R Programming - part 1.pdf
RohanBorgalli
 
PDF
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
PDF
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
PDF
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
PPT
R Programming for Statistical Applications
drputtanr
 
PPT
R-programming with example representation.ppt
geethar79
 
PDF
Data analysis in R
Andrew Lowe
 
PPTX
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
PPTX
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
PPTX
Introduction To Programming In R for data analyst
ssuser26ff68
 
PPTX
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
Introduction to R _IMPORTANT FOR DATA ANALYTICS
HaritikaChhatwal1
 
R training at Aimia
Ali Arsalan Kazmi
 
Introduction+to+R.pdf
MudasserAziz2
 
DATA MINING USING R (1).pptx
myworld93
 
Introduction to R programming Language.pptx
kemetex
 
Basics of R-Programming with example.ppt
geethar79
 
Basocs of statistics with R-Programming.ppt
geethar79
 
R-Programming.ppt it is based on R programming language
Zoha681526
 
R programming by ganesh kavhar
Savitribai Phule Pune University
 
R Programming - part 1.pdf
RohanBorgalli
 
R-Language-Lab-Manual-lab-1.pdf
KabilaArun
 
R-Language-Lab-Manual-lab-1.pdf
attalurilalitha
 
R-Language-Lab-Manual-lab-1.pdf
DrGSakthiGovindaraju
 
R Programming for Statistical Applications
drputtanr
 
R-programming with example representation.ppt
geethar79
 
Data analysis in R
Andrew Lowe
 
Data Science With R Programming Unit - II Part-1.pptx
narasimharaju03
 
Data science with R Unit - II Part-1.pptx
narasimharaju03
 
Introduction To Programming In R for data analyst
ssuser26ff68
 
Big Data Mining in Indian Economic Survey 2017
Parth Khare
 
Ad

More from Stéphane Fréchette (18)

PPTX
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
PPTX
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
PPTX
Power BI - Bring your data together
Stéphane Fréchette
 
PPTX
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
PPTX
Introduction to Azure HDInsight
Stéphane Fréchette
 
PDF
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
PPTX
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
PPTX
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
PPTX
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
PDF
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
PPTX
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
PPTX
TEDxGatineau
Stéphane Fréchette
 
PPTX
Power BI
Stéphane Fréchette
 
PPTX
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
PDF
Data Quality Services in SQL Server 2012
Stéphane Fréchette
 
PDF
Business Intelligence in Excel 2013
Stéphane Fréchette
 
KEY
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
KEY
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 
Back to the future - Temporal Table in SQL Server 2016
Stéphane Fréchette
 
Self-Service Data Integration with Power Query - SQLSaturday #364 Boston
Stéphane Fréchette
 
Power BI - Bring your data together
Stéphane Fréchette
 
Self-Service Data Integration with Power Query
Stéphane Fréchette
 
Introduction to Azure HDInsight
Stéphane Fréchette
 
Le journalisme de données... par où commencer?
Stéphane Fréchette
 
Modernizing Your Data Warehouse using APS
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals - SQLSaturday #350 Winnipeg
Stéphane Fréchette
 
Graph Databases for SQL Server Professionals
Stéphane Fréchette
 
SQL Server 2014 Faster Insights from Any Data
Stéphane Fréchette
 
On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)
Stéphane Fréchette
 
TEDxGatineau
Stéphane Fréchette
 
Introduction to Master Data Services in SQL Server 2012
Stéphane Fréchette
 
Data Quality Services in SQL Server 2012
Stéphane Fréchette
 
Business Intelligence in Excel 2013
Stéphane Fréchette
 
Gatineau Ouverte troisième rencontre publique
Stéphane Fréchette
 
Gatineau Ouverte première rencontre publique
Stéphane Fréchette
 

Recently uploaded (20)

PPTX
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PDF
The Growing Value and Application of FME & GenAI
Safe Software
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
PDF
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
PPTX
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
PDF
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
PDF
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
PDF
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PDF
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PDF
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
PDF
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
PDF
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
PDF
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
PDF
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
PPTX
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
PDF
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 
Security Tips for Enterprise Azure Solutions
Michele Leroux Bustamante
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
The Growing Value and Application of FME & GenAI
Safe Software
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Fwdays
 
Hyderabad MuleSoft In-Person Meetup (June 21, 2025) Slides
Ravi Tamada
 
Wenn alles versagt - IBM Tape schützt, was zählt! Und besonders mit dem neust...
Josef Weingand
 
2025_06_18 - OpenMetadata Community Meeting.pdf
OpenMetadata
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
"Database isolation: how we deal with hundreds of direct connections to the d...
Fwdays
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Smarter Aviation Data Management: Lessons from Swedavia Airports and Sweco
Safe Software
 
PyCon SG 25 - Firecracker Made Easy with Python.pdf
Muhammad Yuga Nugraha
 
Connecting Data and Intelligence: The Role of FME in Machine Learning
Safe Software
 
From Manual to Auto Searching- FME in the Driver's Seat
Safe Software
 
Quantum AI Discoveries: Fractal Patterns Consciousness and Cyclical Universes
Saikat Basu
 
ReSTIR [DI]: Spatiotemporal reservoir resampling for real-time ray tracing ...
revolcs10
 
"How to survive Black Friday: preparing e-commerce for a peak season", Yurii ...
Fwdays
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Tech-ASan: Two-stage check for Address Sanitizer - Yixuan Cao.pdf
caoyixuan2019
 

Data Analytics with R and SQL Server

  • 1. Data Analytics with R and SQL Server Stéphane Fréchette Thursday March 19, 2015
  • 2. Who am I? My name is Stéphane Fréchette SQL Server MVP | Consultant | Speaker | Data & BI Architect | Big Data |NoSQL | Data Science. Drums, good food and fine wine. I have a passion for architecting, designing and building solutions that matter. Twitter: @sfrechette Blog: stephanefrechette.com Email: [email protected]
  • 3. Topics • What is R? • Should I use R? • Data Structures • Graphics • Data Manipulation in R • Connecting to SQL Server • Demos • Resources • Q&A
  • 4. DISCLAIMER This is not a course nor a tutorial, but an introduction, a walkthrough to inspire you to further explore and learn more about R and statistical computing
  • 5. “ Analysis of data is a process of inspecting, cleaning, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.” - Wikipedia
  • 6. What is R? • A programming language, environment for statistical computing and graphics • R has its origins in the S programming language created in the 1970’s • Best used to manipulate moderately sized datasets, do statistical analysis and produce data-centric documents and presentations • These tools are distributed as packages, which any user can download to customize the R environment • Cross-platform: runs on Mac, Windows and Unix based systems
  • 7. Should I use R? Are you doing statistics ? No Yes No Yes Where “statistics” can mean machine learning, predictive analytics, data science, anything that falls under a rather broad umbrella… But if you have some data that makes sense to represent in a tabular like structure, and you want to do some cool analytical or statistics stuff with it, R is definitely a good choice…
  • 8. Downloading and Installing R http://www.r-project.org/ http://www.rstudio.com/
  • 9. The IDE (RStudio) 1. View Files and Data 2. See Workspace and History 3. See Files, Plots, Packages and Help 4. Console 1 2 34
  • 10. Installing Packages • To use packages in R, one must first install them using the install.packages function • Downloads the packages from CRAN and installs it to ready to be use
  • 11. Loading Packages • To use particular packages in your current R session, one must load it into the R environment using the library or require functions
  • 12. Common Data Structures in R To make the best of the R language, one needs a strong understanding of the basic data types and data structures and how to operate and use them. R has a wide variety of data types including scalars, vectors (numerical, character, logical), matrices, data frames, and lists… To understand computations in R, two slogans are helpful: • Everything that exists is an object • Everything that happens is a function call John Chambers creator of the S programming language, and core member of the R programming language project.
  • 13. Data Structures - Vectors The simplest structure is the numeric vector, which is a single entity consisting of an ordered collection of numbers.
  • 14. Data Structures - Matrices Matrices are nothing more than 2-dimensional vectors. To define a matrix, use the function matrix.
  • 15. Data Structures - Data frames Time series are often ordered in data frames. A data frame is a matrix with names above the columns. This is nice, because you can call and use one of the columns without knowing in which position it is.
  • 16. Data Structures - Lists An R list is an object consisting of an ordered collection of objects known as its components.
  • 17. Data Structures - Date and Time Sys.time() # returns the current system date time
  • 18. Data Structures - Date and Time Two main (internal) formats for date-time are: POSIXct and POSIXlt • POSIXct: A short format of date-time, typically used to store date-time columns in a data-frame • POSIXlt: A long format of date-time, various other sub-units of time can be extracted from here
  • 19. Data Structures - Others Other useful and important data type • NULL: Typically used for initializing variables. (x = NULL) creates a variable x of length zero. The function is.null() returns TRUE or FALSE and tells whether a variable is NULL or not. • NA: Used for denoting missing values. (x = NA) creates a variable x with missing values. The function is.na() returns TRUE or FALSE and tells whether a variable is NA or not. • NaN: NaN stands for “Not a Number”. Prints a warning message in console. The function is.nan() lets you check whether the value of a variable is NaN or not. • Inf: Inf stands for “Infinity”. (x = 10/0 ; y = -3/0) sets value of x to Inf ad y to –Inf. The function is.finite() lets you check whether the value of a variable is infinity or not.
  • 20. Graphics One of the main reasons data analysts and data scientists turn to R is for its strong graphic capabilities. Basic Graphs: • These include density plots (histograms and kernel density plots), dot plots, bar charts (simple, stacked, grouped), line charts, pie charts (simple, annotated, 3D), boxplots (simple, notched, violin plots, bagplots) and scatter plots (simple, with fit lines, scatterplot matrices, high density plots, and 3D plots).
  • 21. Graphics Advances Graphs: • Graphical parameters describes how to change a graph's symbols, fonts, colors, and lines. Axes and text describe how to customize a graph's axes, add reference lines, text annotations and a legend. Combining plots describes how to organize multiple plots into a single graph. • The lattice package provides a comprehensive system for visualizing multivariate data, including the ability to create plots conditioned on one or more variables. The ggplot2 package offers a elegant systems for generating univariate and multivariate graphs based on a grammar of graphics.
  • 22. Data Manipulation in R dplyr an R package for fast and easy data manipulation. Data manipulation often involves common tasks, such as selecting certain variables, filtering on certain conditions, deriving new variables from existing variables, and so forth. If we think of these tasks as “verbs”, we can define a grammar of sorts for data manipulation. In dplyr the main verbs (or functions) are: • filter: select a subset of the rows of a data frame • arrange: works similarly to filter, except that instead of filtering or selecting rows, it reorders them • select: select columns of a data frame • mutate: add new columns to a data frame that are functions of existing columns • summarize: summarize values • group_by: describe how to break a data frame into groups of rows
  • 24. Connecting R and SQL Server The RODBC package provides access to databases (including Microsoft Access and Microsoft SQL Server) through an ODBC interface Function Description odbcConnection(dsn, uid = “”, pwd = “”) Open a connection to an ODBC database sqlFetch(channel, sqtable) Read a table from an ODBC database into a data frame sqlQuery(channel, query) Submit a query to an ODBC database and return the results sqlSave(channel, mydf, tablename = sqtable, append = FALSE) Write or update (append=TRUE) a data frame to a table in the ODBC database sqlDrop(channel, sqtable) Remove a table from the ODBC database close(channel) Close the connection
  • 26. Other interface The RJDBC package provides access to databases through a JDBC interface. (requires JDBC driver from Microsoft)
  • 27. Demo [Let’s analyze - R and SQL Server]
  • 28. Resources • The R Project for Statistical Computing http://www.r-project.org/ • RStudio http://www.rstudio.com/ • Revolution Analytics http://www.revolutionanalytics.com/ • Shiny http://shiny.rstudio.com/ • {swirl} Learn R, in R http://swirlstats.com/ • R-bloggers http://www.r-bloggers.com/ • Online R resources for Beginners http://bit.ly/1x2q6Gl • 60+ R resources to improve your data skills http://bit.ly/1BzW4ox • Stack Overflow - R http://stackoverflow.com/tags/r • Cerebral Mastication - R Resources http://bit.ly/17YhZj4 • Microsoft JDBC Drivers 4.1 and 4.0 for SQL Server http://bit.ly/1kEgJ7O
  • 29. What Questions Do You Have?
  • 30. Thank You For attending this session