I have this lines of code which produces different results with and without using diskframe.
a.df -> the diskframe with 2735110 rows
the group_by line:
result <- a.df %>%
group_by(col1,col2,col3,col4) %>%
summarize(tot4 = sum(col4), tot5 = sum(col5)) %>%
chunk_ungroup()
after the execution the result has 2735110 rows
but the same line with data frame (or at least when I collect(a.df)) returns different number of rows: 273511 rows
result <- collect(a.df) %>%
group_by(col1,col2,col3,col4) %>%
summarize(tot4 = sum(col4), tot5 = sum(col5)) %>%
ungroup
I cannot and should not collect the a.df here because it will be so big in future.
any suggestion or advice on this?
Thanks in advance
I have this lines of code which produces different results with and without using diskframe.
a.df -> the diskframe with 2735110 rows
the group_by line:
after the execution the result has 2735110 rows
but the same line with data frame (or at least when I collect(a.df)) returns different number of rows: 273511 rows
I cannot and should not collect the a.df here because it will be so big in future.
any suggestion or advice on this?
Thanks in advance