А
Size: a a a
A
А
АК
data.table
vs pandas
могу добавить, что кривым лапами можно и из data.table
padnas
сделать. Код с претензией на оптимальный надо писать с options(data.table.verbose = TRUE)
. Много нового про себя и R сразу узнаёшь.data.table
можно показывать сходные результаты, но сильно отрываться на больших.EP
ВК
А
PU
AB
data.table
vs pandas
могу добавить, что кривым лапами можно и из data.table
padnas
сделать. Код с претензией на оптимальный надо писать с options(data.table.verbose = TRUE)
. Много нового про себя и R сразу узнаёшь.data.table
можно показывать сходные результаты, но сильно отрываться на больших.АК
AB
AB
AB
AB
AB
АК
AB
АК
!
. Код для воспроизведения:library(data.table)
library(microbenchmark)
set.seed(1)
n <- 10e5
d <- data.table(
g = sample(letters, n, replace = TRUE),
f = sample(c(TRUE, FALSE), n, replace = TRUE)
)
options(datatable.verbose = TRUE)
invisible(d[, .(v = sum(f)), by = .(g)])
invisible(d[, .(v = sum(!f)), by = .(g)])
АК
!
. Код для воспроизведения:library(data.table)
library(microbenchmark)
set.seed(1)
n <- 10e5
d <- data.table(
g = sample(letters, n, replace = TRUE),
f = sample(c(TRUE, FALSE), n, replace = TRUE)
)
options(datatable.verbose = TRUE)
invisible(d[, .(v = sum(f)), by = .(g)])
invisible(d[, .(v = sum(!f)), by = .(g)])
> invisible(d[, .(v = sum(f)), by = .(g)])
Detected that j uses these columns: f
Finding groups using forderv ... forder.c received 1000000 rows and 1 columns
0.012s elapsed (0.013s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 26
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'list(sum(f))'
GForce optimized j to 'list(gsum(f))'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.000
gforce assign high and low took 0.014
This gsum took (narm=FALSE) ... gather took ... 0.003s
0.004s
gforce eval took 0.004
0.019s elapsed (0.030s cpu)
АК
> invisible(d[, .(v = sum(!f)), by = .(g)])
Detected that j uses these columns: f
Finding groups using forderv ... forder.c received 1000000 rows and 1 columns
0.013s elapsed (0.017s cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.000s elapsed (0.000s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 26
0.000s elapsed (0.000s cpu)
lapply optimization is on, j unchanged as 'list(sum(!f))'
GForce is on, left j unchanged
Old mean optimization is on, left j unchanged.
Making each group and running j (GForce FALSE) ...
collecting discontiguous groups took 0.004s for 26 groups
eval(j) took 0.005s for 26 calls
0.005s elapsed (0.009s cpu)