From: Charles C. Berry <cberry_at_tajo.ucsd.edu>

Date: Wed 17 May 2006 - 01:55:47 EST

[1] 5.13 0.62 5.78 NA NA

*> # well-known asymptotics illustrated here:
*

*> hist(dec.index - mean(y)*n.index )
*

*> gc()
*

*>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed May 17 02:01:36 2006

Date: Wed 17 May 2006 - 01:55:47 EST

If you are SURE you need the exact distribution of the t-statistic (and asymptotics ala Pitman, 1938, just will not do), then

You can get it by hacking the

dec.index <- dec.index + <something.else>

line in the for loop and making several passes through the data to accumulate the necessary pieces to construct the sufficient statistics.

e.g. for the sums for one group <something.else> is 'y[pos]':

*> y <- rnorm(24) # simulate 24 observations
**> hi.index <- 24
**> n.index <- 12
**> breaks <- lapply(1:n.index,function(x) choose(-1+1:hi.index,x))
**> index <- seq(0,length=choose(hi.index,n.index))
**> dec.index <- rep(as.double(0),choose(hi.index,n.index))
**> system.time(
*

+ for (i in n.index:1){ + pos <- findInterval(index,breaks[[i]]) + index <- index - breaks[[i]][pos] + dec.index <- dec.index + y[pos] + } + )

[1] 5.13 0.62 5.78 NA NA

used (Mb) gc trigger (Mb) max used (Mb) Ncells 181730 4.9 407500 10.9 350000 9.4Vcells 6854944 52.3 17713407 135.2 17695082 135.1

**HTH,
**
Chuck

On Tue, 16 May 2006, Nameeta Lobo wrote:

> Hi all

*>
**> I am actually trying to create a permuatation matrix whose number of columns
**> equals the number of subjects I have. This matrix is actually supposed to have
**> -1s and 1s in a unique combination and hence the expand grid question before.
**> The reason I need this was because I was going to multiply the conditions by
**> this matrix and calculate a t-test or any other stat and I needed unique
**> simulations.(these simulations were being generated for further calculations to
**> find significant data). As the number of subjects grew more, we decided to
**> switch signs for just half the number of subjects so as to reduce the number of
**> simulations and get more reliable output. Besides for e.g. 1 1 1 -1 and -1 -1 -1
**> 1 would give me the same output being mirror images and so I could reduce the
**> number of simulations even more.
**>
**> I had written all this is a for loop with the 1s and -1s and then I wasnt really
**> sure if I tried bitwise addition whether it would be faster. I thought I could
**> then just replace the 0s in the matrix by -1s. I knew that computer time as well
**> as RAM required would be a major problem but was still unsure. I have 11
**> subjects in one study and so I followed the 2^11 with the expand grid and with
**> switching half the signs just change it for 6 of the 11 but just wanted to know
**> how I would manage with a higher number of subjects.
**>
**> Sorry about all this. I am new to R and I was just trying to speed up the
**> process without killing the machine and just wanted to be aware of what was out
**> there.
**>
**> thanks a million for all your extremely prompt responses. This is really
**> appreciated.
**>
**> Nameeta
**>
**>
**>
**>
**>
**>
**>
**>
**> Quoting Marc Schwartz <MSchwartz@mn.rr.com>:
**>
**>> On Tue, 2006-05-16 at 09:45 +0200, Uwe Ligges wrote:
**>>> Nameeta Lobo wrote:
**>>>
**>>>> Hello all
**>>>>
**>>>> thank you very much for all your suggestions. I actually need binary
**>>>> representations. I tried all the methods that Marc,Jim and Charles have
**>>>> suggested and they ran fine(thanks a lot). I tried doing it then with 26
**>> and 13
**>>>> and that's when the computer gave way. I just got a message with all
**>> three
**>>>> methods that a vector of .....Kb cannot be allocated. guess I will have
**>> to
**>>>> change the environment to allow for huge vector size allocation. How do I
**>> do that?
**>>>
**>>>
**>>> You should have *at least* 512Mb in your machine for the solution given
**>>> by Charles C. Berry with the numbers given above, better a machine with
**>> 1Gb.
**>>>
**>>> Uwe Ligges
**>>
**>> In addition to Uwe's comment, there are some practical issues that will
**>> apply here shortly if Nameeta continues to increase the size of the
**>> source vector:
**>>
**>> 1. R has a limitation of 2^32 - 1 elements in a vector. This is the same
**>> for both 32 and 64 bit platforms. Thus, if Nameeta is planning to
**>> continue to expand the upper limit of the range, you will hit this
**>> fairly quickly. You would then need to consider some form of a
**>> partitioning approach if you go beyond that limit.
**>>
**>> 2. The RAM requirements to simply apply Charles' solution will continue
**>> to expand as the upper limit increases, so Uwe's figure is but one
**>> number that solves the indicated example of 2^26, but will be
**>> insufficient beyond that.
**>>
**>> 3. This still does not address Nameeta's now explicitly stated desire
**>> for the binary character representations, which requires additional
**>> memory beyond that required for the initial step of identifying the
**>> numbers that meet the 'bit requirements' alone.
**>>
**>>> From my prior post over the weekend, to store the character matrix of
**>> binary representations for 2^25 with 9 bits, which contained 2,042,975
**>> values, it required approximately 128 Mb for the final paste()'d
**>> versions of the numbers.
**>>
**>> That is AFTER doing the initial conversion using digitsBase(), which
**>> required 400 Mb to store the intermediate integer matrix result. One
**>> could certainly do that in a partitioned or loop based approach to
**>> conserve memory, but it still will hit practical limits in short order.
**>>
**>> Those figures too will expand dramatically as the upper limit increases.
**>>
**>> For example, going from 2^24 with 12 bits to 2^26 with 13 bits, results
**>> in going from 2,704,156 values in the result to 10,400,600 in the
**>> result. That's a 3.8 fold increase in the result vector size. It does
**>> not take long to figure out how much memory will be required for these
**>> operations as the upper range increases.
**>>
**>> Depending upon what Nameeta is planning to do with the final resultant
**>> character vectors, one could consider a loop based print method/function
**>> that takes the values in the initial 'dec.index' vector and simply
**>> cat()'s them to some output. However, you would not be able to actually
**>> store them as a single matrix given the memory requirements.
**>>
**>> Perhaps Nameeta can indicate what the primary problem is here, which
**>> might in turn allow someone to offer an alternative approach that is
**>> more resource sparing.
**>>
**>> HTH,
**>>
**>> Marc Schwartz
**>>
**>>
**>>
**>>
**>>
**>
**>
**>
**>
**>
**> -------------------------------------------------
**> This email is intended only for the use of the individual ...{{dropped}}
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Wed May 17 02:01:36 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 17 May 2006 - 04:10:09 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*