Re: [R] Memory exceeding for split

From: jim holtman <jholtman_at_gmail.com>
Date: Tue 25 Jul 2006 - 06:19:58 EST

Here is an example of creating the set of indices (instead of the dataframes) and then using them to compute on some of the values.

# create some test data
n <- 1000
x <- data.frame(a=sample(letters[1:5], n, TRUE), b=sample(LETTERS[1:5], n, TRUE),
   c=sample(1:10, n, 1000))
x.split <- split(seq(nrow(x)), list(x$a, x$b)) # using 'a' & 'b', create index values by group # use the indices to compute summary; use the indices in the 'list'
# to access the data ('c')
lapply(x.split, function(z) summary(x$c[z]))

This will be faster since you only keep the original dataframe and then just subset the data that you want to analyze based on the indices

On 7/24/06, Eduardo Dutra de Armas <eduarmasrs@yahoo.com.br> wrote:
>
> Hi R-users
> I'm working with a data.frame of 40000 x 10, for which I need to apply the
> "split" function. The result is very long and cannot be stored in a
> variable
> due to memory exceeding. I've tried to send the result directly to a file
> through sink(filename) function, but the problem still occurs. Does anyone
> have an idea to solve this issue?
>
> > dim(dados)
> [1] 40000 10
> > sink("d:/points.dta")
> > split(data,
> list(data$Easting,data$Northing,data$Depth,data$Media,data$Type), drop=T)
> Error: cannot allocate vector of size 1334208 Kb
> In addition: Warning messages:
> 1: Reached total allocation of 125Mb: see help(memory.size)
> 2: Reached total allocation of 125Mb: see help(memory.size)
>
> Best Regards,
>
>
> __________________________________________________________
> Eng. Agr., M.Sc. Eduardo Dutra de Armas
> __________________________________________________________
> Centro de Energia Nuclear na Agricultura (CENA/USP)
> Laboratório de Ecotoxicologia
> Av.Centenário 303, C.P. 96, CEP 13400-970
> Piracicaba, SP, Brasil - Fone: (55-19)34294761 - Fax: (55-19)34294610
> (Áreas de atuação: Poluição de solo e água, Dinâmica Ambiental de
> pesticidas, Biodegradação, Microbiologia Ambiental)
> __________________________________________________________
>
>
>
> [[alternative HTML version deleted]]
>
>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]


______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Received on Tue Jul 25 07:35:47 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 10:21:35 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.