Aggregate data using custom functions using R
Last Updated :
12 Apr, 2024
In this article, we will explore various methods to aggregate data using custom functions by using the R Programming Language.
What is a custom function?
Custom functions are an essential part of R programming, which allows users to create reusable blocks of code tailored to their specific needs. These functions encapsulate a series of operations, making code readable, and easier to maintain.
How to aggregate data using custom functions
The aggregate function in R is designed to aggregate data in a data frame. R language offers various methods to aggregate data by using custom functions. By using these methods provided by R, it is possible to aggregate data easily. Some of the methods to aggregate data using custom functions are:
Aggregating data by sum using the custom function
This method is used to aggregate data by sum using the custom function. In the below example, we created a data frame and performed mean by using the custom function .
R
# creating data frame
df <- data.frame(
date = as.Date(c("2024-01-01", "2024-01-15", "2024-02-10", "2024-02-20", "2024-03-20",
"2024-03-15")),
sold = c(100, 150, 200, 250,300,350)
)
print("The original dataframe is")
print(df)
# Custom function to result
result = function(x) {
return(sum(x))
}
print("After calculating the sum is")
sales_permonth <- aggregate(sold ~ format(date, "%Y-%m"),
data = df, FUN = result)
print(sales_permonth)
Output:
[1] "The original dataframe is"
date sold
1 2024-01-01 100
2 2024-01-15 150
3 2024-02-10 200
4 2024-02-20 250
5 2024-03-20 300
6 2024-03-15 350
[1] "Aggregating data per month is"
format(date, "%Y-%m") sold
1 2024-01 250
2 2024-02 450
3 2024-03 650
In the below example, we created a data frame and performed sum by using the custom function .
R
goods=c("a","b","c","d","b","c","a")
prices=c(100,200,300,400,500,600,700)
#creating data frame
df = data.frame(goods,prices)
print(df)
print("After calculating the sum is")
res = aggregate(prices ~ goods , data = df, FUN = sum)
print(res)
Output:
goods prices
1 a 100
2 b 200
3 c 300
4 d 400
5 b 500
6 c 600
7 a 700
[1] "Aggregating data by sum is"
goods prices
1 a 800
2 b 700
3 c 900
4 d 400
Aggregating data by mean using the custom function
This method is used to aggregate data by mean using the custom function. In the below example, we created a data frame and performed mean by using the custom function .
R
names=c("a","a","b","c","c","b")
scores=c(100,95,90,80,85,70)
# creating data frame
df = data.frame(names,scores)
print("The original dataframe is")
print(df)
# calculating mean
cal_mean = function(x) {
return(mean(x))
}
print("After calculating the mean is")
result = aggregate(scores ~names, data = df,
FUN = cal_mean)
print(result)
Output:
[1] "The original dataframe is"
names scores
1 a 100
2 a 95
3 b 90
4 c 80
5 c 85
6 b 70
[1] "After calculating the mean is"
names scores
1 a 97.5
2 b 80.0
3 c 82.5
In the below example, we created a data frame and performed mean by using the custom function.
R
team = c("csk", "rcb", "rcb", "srh", "srh","csk",'csk')
run_rate= c(80, 85, 70, 85, 85, 86, 95)
# creating data frame
df = data.frame(team, run_rate)
print("The original dataframe is")
print(df)
cal_mean = function(x) {
return(mean(x))
}
print("After calculating the mean is")
# Aggregating data by group
result <- aggregate(run_rate ~ team, data = df,
FUN = cal_mean)
print(result)
Output:
[1] "The original dataframe is"
team run_rate
1 csk 80
2 rcb 85
3 rcb 70
4 srh 85
5 srh 85
6 csk 86
7 csk 95
[1] "After calculating the mean is"
team run_rate
1 csk 87.0
2 rcb 77.5
3 srh 85.0
Aggregating data by median using the Custom Function
This method is used to aggregate data by median using the custom function. In the below example, we created a data frame and performed median by using the custom function.
R
# Sample data
prices <- data.frame(
category = c("A", "A","A", "B", "B","B", "C", "C","C"),
values = c(10, 15, 20, 23, 30, 25, 40, 55, 60)
)
print("The original dataframe is")
print(prices)
# calculating median
cal_median = function(x) {
return(median(x))
}
result = aggregate(values ~ category,
data = prices, FUN = cal_median)
print("After calculating the median is")
print(result)
Output:
[1] "The original dataframe is"
category values
1 A 10
2 A 15
3 A 20
4 B 23
5 B 30
6 B 25
7 C 40
8 C 55
9 C 60
[1] "After calculating the median is"
category values
1 A 15
2 B 25
3 C 55
In the below example, we created a data frame and performed median by using the custom function.
R
name=c("a","b","c","b","a","b")
r_no=c(350,355,355,360,365,370)
# creating data frame
product_prices = data.frame(name, r_no )
print("The original dataframe is")
print(product_prices)
# To calculate median
calculate_median = function(x) {
return(median(x))
}
res<- aggregate(r_no~ name, data = product_prices,
FUN = calculate_median)
print(res)
Output:
[1] "The original dataframe is"
name r_no
1 a 350
2 b 355
3 c 355
4 b 360
5 a 365
6 b 370
name r_no
1 a 357.5
2 b 360.0
3 c 355.0
Aggregating data by standard deviation using the Custom Function
This method is used to aggregate data by standard deviation using the custom function. In the below example, we created a data frame and performed standard deviation by using the custom function.
R
batch = c("x", "y", "x", "y", "x","x")
number = c(20, 35, 20, 34, 25,40)
df <- data.frame(batch, number)
print(df)
cus_sd <- function(x) {
return(sd(x, na.rm = TRUE))
}
res = aggregate(number ~ batch, data = df, FUN = cus_sd)
print(res)
Output:
batch number
1 x 20
2 y 35
3 x 20
4 y 34
5 x 25
6 x 40
batch number
1 x 9.4648472
2 y 0.7071068
In the below example, we created a data frame and performed standard deviation by using the custom function.
R
names = c("raju", "ravi", "rakesh", "raju", "rakesh","ravi")
cgpa = c(7.5, 8.5, 7.0, 9.5, 8.8, 8.0)
df <- data.frame(names, cgpa)
print(df)
cus_sd <- function(x) {
return(sd(x, na.rm = TRUE))
}
print("After calculating the standard deviation is")
res = aggregate( cgpa ~ names, data = df, FUN = cus_sd)
print(res)
Output:
names cgpa
1 raju 7.5
2 ravi 8.5
3 rakesh 7.0
4 raju 9.5
5 rakesh 8.8
6 ravi 8.0
[1] "After calculating the standard deviation is"
names cgpa
1 raju 1.4142136
2 rakesh 1.2727922
3 ravi 0.3535534
Conclusion
In Conclusion, we learned about how to aggregate data by using the custom functions using R. R language offers versatile tools while handling with custom functions.
Similar Reads
DAX Aggregate Functions in Power BI
Microsoft Power BI is a tool that helps businesses to analyze data and create interactive reports and visualizations. It can connect to various data sources such as Excel, SQL databases, cloud services, etc and can perform aggregate functions on them for analysis. In this article we will learn about
6 min read
How to Use aggregate Function in R
In this article, we will discuss how to use aggregate function in R Programming Language. aggregate() function is used to get the summary statistics of the data by group. The statistics include mean, min, sum. max etc. Syntax: aggregate(dataframe$aggregate_column, list(dataframe$group_column), FUN)
2 min read
Filtering Rows Using Aggregate Functions in PostgreSQL
PostgreSQL is an advanced relational database system that supports both relational (SQL) and non-relational (JSON) queries. It is free and open-source. Filtering rows based on conditions is a regular operation in database administration. Although filtering rows by each column value is easily done, m
5 min read
PL/SQL Aggregate Function
In PL/SQL, aggregate functions play an important role in summarizing and analyzing data from large datasets. These built-in SQL functions perform calculations on a set of values and return a single result, making them invaluable for tasks like calculating totals, averages, and identifying the highes
5 min read
How to Create a Custom Synthetic Dataset in R
Making synthetic datasets in R Programming Language is like creating pretend data that looks real. These datasets act like real ones, so you can test things out and study them closely. Here we'll show how to make our own synthetic datasets using R. It's easy and gives you the freedom to play around
5 min read
MySQL Aggregate Function
MySQL Aggregate Functions are used to calculate values from multiple rows and return a single result, helping in summarizing and analyzing data. They include functions for counting, summing, averaging, and finding maximum or minimum values, often used with the GROUP BY clause. In this article, we wi
3 min read
How to Use SELECT With Aggregate Functions in SQL?
SQL aggregate functions are essential tools for summarizing and processing data. These functions help us perform calculations on a set of values to produce a single result, such as SUM, COUNT, AVG, MAX, and MIN. These functions work with the SELECT statement to process data and derive meaningful ins
4 min read
How to Fix Error in aggregate.data.frame in R
The aggregate function in R applies the data aggregation on the basis of required factors. Yet, users are bound to find errors while dealing with data frames. In this article, common errors and effective solutions to solve them are elucidated.Common Errors in aggregate.data.frameErrors may arise, pa
2 min read
How to Aggregate multiple columns in Data.table in R ?
In this article, we will discuss how to aggregate multiple columns in Data.table in R Programming Language. A data.table contains elements that may be either duplicate or unique. As a result of this, the variables are divided into categories depending on the sets in which they can be segregated. The
5 min read
Group by function in R using Dplyr
Group_by() function belongs to the dplyr package in the R programming language, which groups the data frames. Group_by() function alone will not give any output. It should be followed by summarise() function with an appropriate action to perform. It works similar to GROUP BY in SQL and pivot table i
2 min read