Skip to content
This repository was archived by the owner on Jun 2, 2026. It is now read-only.

Data table functions#1158

Open
kanishkan91 wants to merge 2 commits into
mainfrom
data-table-functions
Open

Data table functions#1158
kanishkan91 wants to merge 2 commits into
mainfrom
data-table-functions

Conversation

@kanishkan91

Copy link
Copy Markdown
Contributor

Adding 2 functions with documentation,

  1. fast_group_by- A faster alternative to the traditional dplyr alternative. It makes use of data.table. It groups data, performs a function, ungroups. Essentially performs a group_by, mutate and ungroup. It can be used within dplyr pipes. Speed increases exponentially with the increase in the volume of underlying data.

  2. data_table_bind- A faster alternative to bind_rows that takes advantage of data.table's data processing capabilities. Returns a tibble after binding all input datasets.

@kanishkan91 kanishkan91 requested a review from bpbond March 26, 2020 17:19
@kanishkan91 kanishkan91 self-assigned this Mar 26, 2020
@codecov

codecov Bot commented Mar 26, 2020

Copy link
Copy Markdown

Codecov Report

Merging #1158 into master will decrease coverage by 0.59%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1158      +/-   ##
==========================================
- Coverage   95.00%   94.40%   -0.60%     
==========================================
  Files          11       11              
  Lines        1421     1430       +9     
==========================================
  Hits         1350     1350              
- Misses         71       80       +9     
Impacted Files Coverage Δ
R/utils.R 94.82% <0.00%> (-3.53%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c8156b3...705e40d. Read the comment docs.

@bpbond bpbond left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style changes only. Thanks @kanishkan91 !

I wonder if we should look for opportunities to use this throughout the codebase--for example, in the current slowest chunks. Thoughts @pralitp ?

Comment thread R/utils.R
fast_group_by<- function(df,by,colname="value",func= "sum"){


#Convert relevant column to numeric

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with rest of codebase, please add a space after all these #s

Comment thread R/utils.R
df<- df[, (colname) := (get(func)(get(colname))), by]

#Save back to tibble
df<- as_tibble(df)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just make line 537 the last one of the function: as_tibble(df)

Comment thread R/utils.R
df <- rbindlist(list_for_bind,use.names=TRUE)

#Return as tibble
df<-as_tibble(df)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Comment thread R/utils.R
list_for_bind =list(...)

#bind into one dataframe using rbindlist
df <- rbindlist(list_for_bind,use.names=TRUE)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
df <- rbindlist(list_for_bind,use.names=TRUE)
df <- rbindlist(list_for_bind, use.names = TRUE)

Comment thread R/utils.R
#' @importFrom dplyr %>%
#' @author kbn 24 Mar 2020
#' @export
fast_group_by<- function(df,by,colname="value",func= "sum"){

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fast_group_by<- function(df,by,colname="value",func= "sum"){
fast_group_by <- function(df, by, colname = "value", func = "sum"){

Base automatically changed from master to main January 19, 2021 20:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants