SlideShare a Scribd company logo
Graphing interpolation, regression,
and subsets of Census data in R
by Doug Loqa
Pull data into R, group data subsets, and
create a comparison between interpolation
and regression curves.
You may start with the groups on the left, but prefer to
summarize some condensed categories like those on the right.
Measures of income
dispersion 2017 2016
…..
10th percentile limit 14,219 13,901
…..
20th percentile limit 24,638 24,518
…..
40th percentile limit 47,110 46,581
…..
50th (median) 61,372 60,309
…..
60th percentile limit 77,552 76,479
…..
80th percentile limit 126,855 123,621
…..
90th percentile limit 179,077 174,203
…..
95th percentile limit 237,034 230,095
…..
Measures of income
dispersion 2017 2016
…..
50th percentile below 36834.75 36327.25
…..
Upper-Middle 102,204 100,050
…..
Top 10 % 208,056 202,149
…..
Save the file from Census data, and save the tabular
data you want into a .csv. Then import it into R (The
categorical data will read horizontally seen below).
Use the read.csv command and make sure headers are set to true,
and StringsAsFactors are set to false. The StringsAsFactors can cause
information to be “bucketed” and make analysis more difficult.
read.csv(“Your data.csv”, headers = T, StringsAsFactors = F)
Convert the input to
numerical data for R
Combine the gsub and as.numeric() functions
gsub(“,”,””,datarow)
as.numeric(datavector) combine below 
as.numeric(gsub(…..)) Assign this to a variable
Problems: The data will come in as
strings, so you won’t be able to run
any calculations initially. Also, R
doesn’t work with commas when
computing values.
Solutions: Run the formulas to the
left to fix this. The combination of
the two functions create numerical
data vectors R can read.
Create all numeric
vectors needed for
your first category
Run the mean function inside the apply() function with your
saved vectors
apply(rbind(vect1, vect2, vect3),2,mean)
In the second argument here, you can enter a “1” or “2” to
calculate across rows, or down columns respectively. In this
case, since we need to calculate down, a 2 was entered.
Remember this was entered horizontally.
Continue this process until you have all of your categorial rows
of data to analyze.
Problem: You still need to combine
vectors for your aggregated
categorical information.
Solution: The way you can
summarize the vectors of data is by
using the apply() function. You
need to know what direction the
data will be computed in, so pay
attention to the second parameter
here.
Use the plot()function with specified
titles, colors, and the category.
Use, col =“color”, xlab, ylab, main, type =“p”, and you can use
seq(1997,2017,1) for x-values here. You can use your categorical data for your
y-values. The x-values should represent the span of data you are measuring
Use the approxfun()function to graph the
interpolation lines.
Use similar categorical information as used for the plot function, and first try the “linear” option. You can
use “2” as a rule to treat the graph as though it was continuing through the points on the ends. The Category is
your vector with y-values. Note if you have your data in columns, you will use Category[,1] instead. f<-
approxfun(seq(start,end,1),Category[1,],method="linear", rule = 2) Start = 1997
and end = 2017 in this case. You then need to use the curve()function and make sure you specify add =
TRUE. I used purple as the color below.
Use the constant argument of the
approxfun()function to see trends.
Use the same categorical information as done before and now use the “constant” option. You can use “2” as
an option to treat the graph as though it was continuing through the points.
f<-approxfun(seq(start,end,1),Category[1,],method=“constant", rule = 2)
Start and end would again be your beginning and ending years. You then need to use the curve function and
make sure you specify add = TRUE. I used lightgreen as the color below.
Use the interpolation curves to determine
what degree of polynomial to use
Where it is obvious, you will see the “angled” interpolations switch from fitting above the linear to
below the linear, and vice versa. Count how many of those “shifts” there are and that is your degree. So,
counting 5 arrows below means there are 5-1 = 4 shifts. You can guess this requires a 4th degree
polynomial.
Use the lm() curves to set up your
polynomial graphs
• Set up your regression picking the degree you guessed from the previous page:
r<-lm(Category[1,]~poly(seq(start,end,1),4,raw = T))
• Now use the predict function to get a regression function out of this:
rc<-predict(r)
Use the lines() function to graph your
regression curve on top of this.
You use the same sequence using the beginning and ending years, and use the regression curve based
on the predict() function, generated in last step, to graph this.
lines(y~x, col=“color choice”,lwd = 2)
Use the par()function to group your graphs
once you have a number of them complete.
par(mfrow = c(1,2)) will give you 2 graphs side-by-side. Make sure that your
titles are short enough to not collide when graphing.
Add rows of graphs and you’re done!
You can make this process even easier by combining some of these steps into a user-
defined function if you know how to set up your variables. Also consider using the
layout() function instead of par()

More Related Content

PPTX
2.6b scatter plots and lines of best fit
PPT
Graphing Power Pt
PPT
Line graphs, slope, and interpreting line graphs
PPTX
Line graph bar graph
DOCX
Excel Training
PDF
Yocco41
PPTX
Statistics
PPTX
Data handling Presentation with solved examples
2.6b scatter plots and lines of best fit
Graphing Power Pt
Line graphs, slope, and interpreting line graphs
Line graph bar graph
Excel Training
Yocco41
Statistics
Data handling Presentation with solved examples

What's hot (20)

PPTX
Data handling -
PPTX
Learning to use Desmos KS3
PDF
R nonlinear least square
PPT
1-7 Presenting Data
PPT
SCATTER PLOTS
PPT
Types of graphs
PPTX
Using lograrithmic graph paper
PDF
Nps speedo meter gauge chart in tabelau
PPT
Scatter plots
PPTX
Data handling
PPTX
Data handling
PPTX
Types of Charts
PPT
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
PPTX
Graphing
DOCX
Different types of charts
PPT
Simple graph types
PPT
Charts And Graphs
PPTX
How to make pie chart and it’s uses
PPTX
Have you ever wonder how Excel sets the upper limit and the lower limit on th...
Data handling -
Learning to use Desmos KS3
R nonlinear least square
1-7 Presenting Data
SCATTER PLOTS
Types of graphs
Using lograrithmic graph paper
Nps speedo meter gauge chart in tabelau
Scatter plots
Data handling
Data handling
Types of Charts
NCV 4 Mathematical Literacy Hands-On Support Slide Show - Module 2 Part 3
Graphing
Different types of charts
Simple graph types
Charts And Graphs
How to make pie chart and it’s uses
Have you ever wonder how Excel sets the upper limit and the lower limit on th...
Ad

Similar to How to combine interpolation and regression graphs in R (20)

PPTX
R part I
DOCX
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
PDF
R code for data manipulation
PDF
R code for data manipulation
PDF
Collect 50 or more paired quantitative data items. You may use a met.pdf
PDF
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
PPTX
5. working on data using R -Cleaning, filtering ,transformation, Sampling
PDF
R Visualization Assignment
PDF
R Cheat Sheet – Data Management
PDF
Practical data science_public
PDF
Ggplot2 cheatsheet-2.1
PDF
data-visualization.pdf
PDF
@ R reference
PDF
R command cheatsheet.pdf
PPTX
Clean, Learn and Visualise data with R
PPTX
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
PDF
Reference card for R
PDF
Short Reference Card for R users.
PDF
Table of Useful R commands.
PDF
Data Visualization with ggplot2.pdf
R part I
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
R code for data manipulation
R code for data manipulation
Collect 50 or more paired quantitative data items. You may use a met.pdf
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
5. working on data using R -Cleaning, filtering ,transformation, Sampling
R Visualization Assignment
R Cheat Sheet – Data Management
Practical data science_public
Ggplot2 cheatsheet-2.1
data-visualization.pdf
@ R reference
R command cheatsheet.pdf
Clean, Learn and Visualise data with R
Murtaugh 2022 Appl Comp Genomics Tidyverse lecture.pptx-1.pptx
Reference card for R
Short Reference Card for R users.
Table of Useful R commands.
Data Visualization with ggplot2.pdf
Ad

Recently uploaded (20)

PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PDF
System and Network Administration Chapter 2
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
top salesforce developer skills in 2025.pdf
PDF
Nekopoi APK 2025 free lastest update
PPTX
assetexplorer- product-overview - presentation
PPTX
Transform Your Business with a Software ERP System
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
history of c programming in notes for students .pptx
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Designing Intelligence for the Shop Floor.pdf
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Computer Software and OS of computer science of grade 11.pptx
Reimagine Home Health with the Power of Agentic AI​
Upgrade and Innovation Strategies for SAP ERP Customers
wealthsignaloriginal-com-DS-text-... (1).pdf
System and Network Administration Chapter 2
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Softaken Excel to vCard Converter Software.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
top salesforce developer skills in 2025.pdf
Nekopoi APK 2025 free lastest update
assetexplorer- product-overview - presentation
Transform Your Business with a Software ERP System
Understanding Forklifts - TECH EHS Solution
history of c programming in notes for students .pptx
Adobe Illustrator 28.6 Crack My Vision of Vector Design
Design an Analysis of Algorithms II-SECS-1021-03
Designing Intelligence for the Shop Floor.pdf
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf

How to combine interpolation and regression graphs in R

  • 1. Graphing interpolation, regression, and subsets of Census data in R by Doug Loqa Pull data into R, group data subsets, and create a comparison between interpolation and regression curves.
  • 2. You may start with the groups on the left, but prefer to summarize some condensed categories like those on the right. Measures of income dispersion 2017 2016 ….. 10th percentile limit 14,219 13,901 ….. 20th percentile limit 24,638 24,518 ….. 40th percentile limit 47,110 46,581 ….. 50th (median) 61,372 60,309 ….. 60th percentile limit 77,552 76,479 ….. 80th percentile limit 126,855 123,621 ….. 90th percentile limit 179,077 174,203 ….. 95th percentile limit 237,034 230,095 ….. Measures of income dispersion 2017 2016 ….. 50th percentile below 36834.75 36327.25 ….. Upper-Middle 102,204 100,050 ….. Top 10 % 208,056 202,149 …..
  • 3. Save the file from Census data, and save the tabular data you want into a .csv. Then import it into R (The categorical data will read horizontally seen below). Use the read.csv command and make sure headers are set to true, and StringsAsFactors are set to false. The StringsAsFactors can cause information to be “bucketed” and make analysis more difficult. read.csv(“Your data.csv”, headers = T, StringsAsFactors = F)
  • 4. Convert the input to numerical data for R Combine the gsub and as.numeric() functions gsub(“,”,””,datarow) as.numeric(datavector) combine below  as.numeric(gsub(…..)) Assign this to a variable Problems: The data will come in as strings, so you won’t be able to run any calculations initially. Also, R doesn’t work with commas when computing values. Solutions: Run the formulas to the left to fix this. The combination of the two functions create numerical data vectors R can read.
  • 5. Create all numeric vectors needed for your first category Run the mean function inside the apply() function with your saved vectors apply(rbind(vect1, vect2, vect3),2,mean) In the second argument here, you can enter a “1” or “2” to calculate across rows, or down columns respectively. In this case, since we need to calculate down, a 2 was entered. Remember this was entered horizontally. Continue this process until you have all of your categorial rows of data to analyze. Problem: You still need to combine vectors for your aggregated categorical information. Solution: The way you can summarize the vectors of data is by using the apply() function. You need to know what direction the data will be computed in, so pay attention to the second parameter here.
  • 6. Use the plot()function with specified titles, colors, and the category. Use, col =“color”, xlab, ylab, main, type =“p”, and you can use seq(1997,2017,1) for x-values here. You can use your categorical data for your y-values. The x-values should represent the span of data you are measuring
  • 7. Use the approxfun()function to graph the interpolation lines. Use similar categorical information as used for the plot function, and first try the “linear” option. You can use “2” as a rule to treat the graph as though it was continuing through the points on the ends. The Category is your vector with y-values. Note if you have your data in columns, you will use Category[,1] instead. f<- approxfun(seq(start,end,1),Category[1,],method="linear", rule = 2) Start = 1997 and end = 2017 in this case. You then need to use the curve()function and make sure you specify add = TRUE. I used purple as the color below.
  • 8. Use the constant argument of the approxfun()function to see trends. Use the same categorical information as done before and now use the “constant” option. You can use “2” as an option to treat the graph as though it was continuing through the points. f<-approxfun(seq(start,end,1),Category[1,],method=“constant", rule = 2) Start and end would again be your beginning and ending years. You then need to use the curve function and make sure you specify add = TRUE. I used lightgreen as the color below.
  • 9. Use the interpolation curves to determine what degree of polynomial to use Where it is obvious, you will see the “angled” interpolations switch from fitting above the linear to below the linear, and vice versa. Count how many of those “shifts” there are and that is your degree. So, counting 5 arrows below means there are 5-1 = 4 shifts. You can guess this requires a 4th degree polynomial.
  • 10. Use the lm() curves to set up your polynomial graphs • Set up your regression picking the degree you guessed from the previous page: r<-lm(Category[1,]~poly(seq(start,end,1),4,raw = T)) • Now use the predict function to get a regression function out of this: rc<-predict(r)
  • 11. Use the lines() function to graph your regression curve on top of this. You use the same sequence using the beginning and ending years, and use the regression curve based on the predict() function, generated in last step, to graph this. lines(y~x, col=“color choice”,lwd = 2)
  • 12. Use the par()function to group your graphs once you have a number of them complete. par(mfrow = c(1,2)) will give you 2 graphs side-by-side. Make sure that your titles are short enough to not collide when graphing.
  • 13. Add rows of graphs and you’re done! You can make this process even easier by combining some of these steps into a user- defined function if you know how to set up your variables. Also consider using the layout() function instead of par()