SlideShare a Scribd company logo
Learnggplot2
AsKungfu Skill
A K
f Skills
Givenby
KaiXiao(DataSciencetist)
VivianZhang(Co-founder&CTO)
Vi i Zh
(C f
d & CTO)
Contact:vivian.zhang@supstat.com
Contact: vivian zhang@supstat com
• I: Point
• II: Bar
• III:Histogram
g
• IV:Line
• V: Tile
• VI:Map
Introduction
• ggplot2 is a plotting system for R
• based on the《The Grammar of 
Graphics》
• which tries to take the good parts 
of base and lattice graphics and 
none of the bad parts
none of the bad parts
• It takes care of many of the fiddly 
details that make plotting a hassle
details that make plotting a hassle
• It becomes easy to produce 
complex multi‐layered graphics
p
y
g p
Whyweloveggplot2?
Why we love ggplot2?
•

controltheplotasabstractlayersandmakecreativitybecomereality;

•

getusedtostructuralthinking;
d
l hi ki

•

getbeautifulgraphicswhileavoidingcomplicateddetails
1973murdercasesinUSA
7BasicConcepts
7 Basic Concepts
• Mapping
• Scale
• Geometric
Stat st cs
• Statistics
• Coordinate
• L
Layer
• Facet
Mapping
Mappingcontrolsrelationsbetweenvariables
M
i
t l
l ti
b t
i bl
Scale
Scalewillpresentmappingoncoordinatescales.
Scale will present mapping on coordinate scales
ScaleandMappingiscloselyrelatedconcepts.
Geometric
Geom means the graphical elements such as
meansthegraphicalelements,suchas
points,linesandpolygons.
Statistics
Statenablesustocalculateanddostatistical
Stat enables us to calculate and do statistical
analysisbased,suchasaddingaregressionline.

Stat
Coordinate
Cood will affect how we observe graphical
willaffecthowweobservegraphical
elements.Transformationofcoordinatesisuseful.

Stat

Coord
Layer
Component:data,mapping,geom,stat
Usinglayerwillallowuserstoestablishplotsstep
i
l
ill ll
bli h l
bystep.Itbecomemucheasiertomodifyaplot.
Facet
Facetsplitsdataintogroupsanddraweach
groupseparately.Usually,thereisaorder.
7BasicConcepts
7 Basic Concepts
• Mapping
• Scale
• Geometric
Stat st cs
• Statistics
• Coordinate
• L
Layer
• Facet
SkillI:Point
Skill I:Point
Sampledata mpg
Sample data--mpg
• Fuel economy data from 1999 and 2008 for 38 popular 
y
p p
models of car
•
•
•
•
•
•
•

Details
D il
Displ :               engine displacement, in litres
Cyl:                    number of cylinders 
Cyl:
number of cylinders
Trans:                type of transmission 
Drv:                   front‐wheel, rear wheel drive, 4wd 
,
,
Cty:                    city miles per gallon 
Hwy:                  highway miles per gallon 
>library(ggplot2)
>str(mpg)
'data.frame':

234obs.of14variables:

$manufacturer:Factorw/15levels"audi","chevrolet",..:
$model:Factorw/38levels"4runner4wd",..:
$displ

:num 1.81.8222.82.83.11.81.82...

$year:int 1999 1999 2008 2008 1999 1999 2008 1999
$ year
: int 19991999200820081999199920081999
$cyl

:int 4444666444...

$trans:Factorw/10levels"auto(av)","auto(l3)",..:
$drv
$d

:Factorw/3levels"4","f","r":
3l
l
f

$cty

:int 18212021161818181620...

$hwy
$fl

:int 29293130262627262528...
:Factorw/5levels"c","d","e","p",..:

$class:Factorw/7levels"2seater","compact",..:
aesthetics
p < ggplot(data mpg, mapping aes(x cty, y hwy))
p <‐ ggplot(data=mpg mapping=aes(x=cty y=hwy))
p + geom_point()
> summary(p) 
data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy, 
fl, class [234x11] 
fl l [
]
mapping: x = cty, y = hwy
faceting: facet_null() 
g
_
()
> summary(p+geom_point())
data: manufacturer, model, displ, year, cyl, trans, drv, cty, hwy,
data: manufacturer model displ year cyl trans drv cty hwy
fl, class [234x11]
mapping:  x = cty, y = hwy
faceting: facet_null() 
f
i f
ll()
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
g
geom_point: na.rm = FALSE 
_p
stat_identity:  
position_identity: (width = NULL, height = NULL)
p + geom_point(color red4 size=3)
p + geom point(color='red4',size 3)
#addonemorelayer--color
p
p<- ggplot(mpg,aes(x=cty,y=hwy,colour=factor(year)))
ggp ( pg,
(
y,y
y,
(y )))
p+geom_point()
#addonemorestat(loess:localpartialpolynomialregression)
> p + geom_point() + stat_smooth()
p <‐ ggplot(data=mpg, mapping=aes(x=cty,y=hwy))
p g
p + geom_point(aes(colour=factor(year)))+
_p
( (
(y )))
stat_smooth()
Twoequallywaystodraw
q
y
y
p  <‐ ggplot(mpg, aes(x=cty,y=hwy))
p <‐ ggplot(mpg aes(x=cty y=hwy))
p  + geom_point(aes(colour=factor(year)))+
()
stat_smooth()

d <‐ ggplot() +
()
geom_point(data=mpg, aes(x=cty, y=hwy, colour=factor(year)))+
stat_smooth(data=mpg, aes(x=cty, y=hwy))
print(d)
Besidethe“whitepaper”canvas,wewillfindgeom andstat
canvas.
> summary(d)
data: [0x0]
[ ]
faceting: facet_null() 
‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐‐
mapping: x = cty, y = hwy, colour = factor(year) 
mapping: x = cty y = hwy colour = factor(year)
geom_point: na.rm = FALSE 
stat_identity:  
position_identity: (width = NULL, height = NULL)
pp g
y, y
y
mapping: x = cty, y = hwy 
geom_smooth:  
stat_smooth: method = auto, formula = y ~ x, se = TRUE, 
n = 80, fullrange = FALSE, level = 0.95, na.rm = FALSE 
n = 80 fullrange = FALSE level = 0 95 na rm = FALSE
position_identity: (width = NULL, height = NULL)
#Usingscale()function,wecancontrolcolorofscale.
p+geom_point(aes(colour=factor(year)))+
stat_smooth()+
h()
scale_color_manual(values=c('steelblue','red4'))
#Wecanmap“displ”to thesizeofpoint
p+geom_point(aes(colour=factor(year),size=displ))+
stat_smooth()+
h()
scale_color_manual(values=c('steelblue','red4'))
# We solve the problem with overlapping and point being too small
p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
stat_smooth()+
scale_color_manual(values =c('steelblue','red4'))+
scale_size_continuous(range = c(4, 10))
# We change the coordinate system.
p + geom_point(aes(colour=factor(year),size=displ), alpha=0.5,position = "jitter")+
stat_smooth()+
scale_color_manual(values =c('steelblue','red4'))+
scale_size_continuous(range = c(4, 10)) +    coord_flip()
p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter")+
stat_smooth()+
()
scale_color_manual(values =c('steelblue','red4'))+
scale_size_continuous(range = c(4, 10))  +    coord_polar()
p + geom_point(aes(colour=factor(year),size=displ),
alpha=0.5,position = "jitter")  +    stat_smooth()+
scale_color_manual(values =c('steelblue','red4'))+
(
(
,
))
scale_size_continuous(range = c(4, 10))+                   
coord_cartesian(xlim = c(15, 25), ylim=c(15,40))
#Usingfacet()function,wenowsplitdataanddrawthembygroup
p+geom_point(aes(colour=class,size=displ),
alpha=0.5,position="jitter")+
stat_smooth()+
h
scale_size_continuous(range=c(4,10))+
facet_wrap(~year,ncol=1)
#Addplotnameandspecifyallinformationyouwanttoadd
p<- ggplot(mpg,aes(x=cty,y=hwy))
p+geom_point(aes(colour=class,size=displ),
alpha=0.5,position="jitter")+stat_smooth()+
scale_size_continuous(range=c(4,10))+
l i
ti
(
(4 10))
facet_wrap(~year,ncol=1)+opts(title='modelofcarandmpg')+
labs(y='drivingdistancepergallononhighway',x='drivingdistancepergallononcityroad',
size='displacement',colour ='model')
#scatterplotfordiamonddataset
p
p<- ggplot(diamonds,aes(carat,price))
ggp (
,
(
,p
))
p+geom_point()
#usetransparencyandsmallsizepoints
p g
p+geom_point(size=0.1,alpha=0.1)
_p
(
, p
)
#usebincharttoobserveintensityofpoints
p
p+stat_bin2d(bins=40)
_
(
)
#estimatedatadentisy
p+stat_density2d(aes(fill=..level..),geom="polygon")+
coord_cartesian(xlim =c(0,1.5),ylim=c(0,6000))
coord cartesian(xlim = c(0 1 5) ylim=c(0 6000))
SkillII:Bar
Skill II:Bar
SkillIII:Histogram
Skill III:Histogram
SkillIV:Line
Skill IV:Line
SkillV:Tile
Skill V:Tile
SkillVI:Map
Skill VI:Map
Resources
http://learnr.wordpress.com
Redrawallthelatticegraph
Redraw all the lattice graph
byggplot2
Resources
Alltheexamplesaredoneby
ggplot2.
ggplot2
Resources
• http://wiki stdout org/rcookbook/Graphs/
http://wiki.stdout.org/rcookbook/Graphs/
• http://r-blogger.com
• htt //St k
http://Stackoverflow.com
fl
• http://xccds1977.blogspot.com
• http://r-ke.info/
• http://www.youtube.com/watch?v=vnVJJYi1
mbw
Thankyou!Comebackformore!
Signupat:www.meetup.com/nyc-open-data
Givefeedbackat:www.bit.ly/nycopen

More Related Content

PDF
Spatial query tutorial for nyc subway income level along subway
PDF
Hack session-- citibike sharing viz(using rcharts & slidify)
PPTX
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
PPTX
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
PDF
An introduction to node3
PDF
Ggplot2 1outof3
PDF
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
PDF
R workshop iii -- 3 hours to learn ggplot2 series
Spatial query tutorial for nyc subway income level along subway
Hack session-- citibike sharing viz(using rcharts & slidify)
R003 jiten south park episode popularity analysis(NYC Data Science Academy, D...
NYC Open Data Meetup-- Thoughtworks chief data scientist talk
An introduction to node3
Ggplot2 1outof3
Hack session for NYTimes Dialect Map Visualization( developed by R Shiny)
R workshop iii -- 3 hours to learn ggplot2 series

Similar to Ggplot2 1outof3 (20)

PDF
Elegant Graphics for Data Analysis with ggplot2
PDF
Data visualization-2.1
PDF
Data Visualization with ggplot2.pdf
PDF
VISIALIZACION DE DATA.pdf
PPTX
RBootcamp Day 4
PDF
data-visualization.pdf
DOCX
R-ggplot2 package Examples
PDF
PDF
Ggplot2 ch2
PDF
Data Analysis with R (combined slides)
PPTX
Visualization_Data with ggplot2_Day 2.pptx
PDF
Ggplot2 cheatsheet-2.1
PDF
Ggplot in python
PDF
Download full ebook of Datacamp Ggplot2 Cheatsheet Itebooks instant download pdf
PDF
Introduction to R Short course Fall 2016
PPTX
Exploratory Analysis Part1 Coursera DataScience Specialisation
PDF
Q plot tutorial
PDF
ggplot for python SV 2014
PDF
ggplot for python
PPTX
The R of War
Elegant Graphics for Data Analysis with ggplot2
Data visualization-2.1
Data Visualization with ggplot2.pdf
VISIALIZACION DE DATA.pdf
RBootcamp Day 4
data-visualization.pdf
R-ggplot2 package Examples
Ggplot2 ch2
Data Analysis with R (combined slides)
Visualization_Data with ggplot2_Day 2.pptx
Ggplot2 cheatsheet-2.1
Ggplot in python
Download full ebook of Datacamp Ggplot2 Cheatsheet Itebooks instant download pdf
Introduction to R Short course Fall 2016
Exploratory Analysis Part1 Coursera DataScience Specialisation
Q plot tutorial
ggplot for python SV 2014
ggplot for python
The R of War
Ad

More from Vivian S. Zhang (20)

PDF
Why NYC DSA.pdf
PPTX
Career services workshop- Roger Ren
PDF
Nycdsa wordpress guide book
PDF
We're so skewed_presentation
PDF
Wikipedia: Tuned Predictions on Big Data
PDF
A Hybrid Recommender with Yelp Challenge Data
PDF
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
PDF
Data mining with caret package
PDF
PPTX
Streaming Python on Hadoop
PDF
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
PDF
PDF
Nyc open-data-2015-andvanced-sklearn-expanded
PDF
Nycdsa ml conference slides march 2015
PDF
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
PDF
Max Kuhn's talk on R machine learning
PDF
Winning data science competitions, presented by Owen Zhang
PDF
Using Machine Learning to aid Journalism at the New York Times
PDF
Introducing natural language processing(NLP) with r
PDF
Bayesian models in r
Why NYC DSA.pdf
Career services workshop- Roger Ren
Nycdsa wordpress guide book
We're so skewed_presentation
Wikipedia: Tuned Predictions on Big Data
A Hybrid Recommender with Yelp Challenge Data
Kaggle Top1% Solution: Predicting Housing Prices in Moscow
Data mining with caret package
Streaming Python on Hadoop
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Nyc open-data-2015-andvanced-sklearn-expanded
Nycdsa ml conference slides march 2015
THE HACK ON JERSEY CITY CONDO PRICES explore trends in public data
Max Kuhn's talk on R machine learning
Winning data science competitions, presented by Owen Zhang
Using Machine Learning to aid Journalism at the New York Times
Introducing natural language processing(NLP) with r
Bayesian models in r
Ad

Recently uploaded (20)

PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
How to Get Business Funding for Small Business Fast
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
PPTX
Amazon (Business Studies) management studies
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
Lecture (1)-Introduction.pptx business communication
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PDF
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
PPT
Chapter four Project-Preparation material
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PPTX
HR Introduction Slide (1).pptx on hr intro
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
DOCX
Business Management - unit 1 and 2
PDF
Cours de Système d'information about ERP.pdf
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
340036916-American-Literature-Literary-Period-Overview.ppt
How to Get Business Funding for Small Business Fast
Ôn tập tiếng anh trong kinh doanh nâng cao
Katrina Stoneking: Shaking Up the Alcohol Beverage Industry
Amazon (Business Studies) management studies
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Unit 1 Cost Accounting - Cost sheet
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Lecture (1)-Introduction.pptx business communication
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
New Microsoft PowerPoint Presentation - Copy.pptx
NISM Series V-A MFD Workbook v December 2024.khhhjtgvwevoypdnew one must use ...
Chapter four Project-Preparation material
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
HR Introduction Slide (1).pptx on hr intro
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
Business Management - unit 1 and 2
Cours de Système d'information about ERP.pdf
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax

Ggplot2 1outof3