Re: [R] random section of samples based on group membership

From: Sebastian Luque <spluque_at_gmail.com>
Date: Tue 25 Jul 2006 - 05:25:54 EST

On Mon, 24 Jul 2006 11:18:10 -0400,
"Wade Wall" <wade.wall@gmail.com> wrote:

> Hi all, I have a matrix of 474 rows (samples) with 565 columns
> (variables). each of the 474 samples belong to one of 120 groups, with
> the groupings as a column in the above matrix. For example, the group
> column would be:

> 1 1 1 2 2 2 . . . 120 120

> I want to randomly select one from each group. Not all the groups have
> the same number of samples, some have 4, some 3 etc. Is there a
> function to do this, or would I need to write a looping statement to
> look at each successive group?

I use the following for that (some of it hacked from help("sample")):

".resample" <- function(x, size, ...) {

    if(length(x) <= 1) {

        if(!missing(size) && size == 0) x[FALSE] else x     } else sample(x, size, ...)
}

"randpick" <- function(x, by, size = 1, ...) {

    nx <- seq(nrow(x))
    ind <- unlist(tapply(nx, by, .resample, size, ...))     x[nx %in% ind, ]
}

So, for instance:

R> randpick(Indometh, Indometh$Subject, 3)

   Subject time conc

2        1 0.50 0.94
7        1 3.00 0.12
11       1 8.00 0.05
15       2 1.00 0.70
16       2 1.25 0.64
19       2 4.00 0.20
25       3 0.75 1.16
29       3 3.00 0.22
32       3 6.00 0.08
34       4 0.25 1.85
43       4 6.00 0.07
44       4 8.00 0.07
48       5 1.00 0.39
54       5 6.00 0.10
55       5 8.00 0.06
58       6 0.75 1.03
64       6 5.00 0.13
65       6 6.00 0.10

R> randpick(Indometh, Indometh$Subject, 2)

   Subject time conc

8        1 4.00 0.11
10       1 6.00 0.07
14       2 0.75 0.71
20       2 5.00 0.25
23       3 0.25 2.72
28       3 2.00 0.39
39       4 2.00 0.40
43       4 6.00 0.07
48       5 1.00 0.39
52       5 4.00 0.11
57       6 0.50 1.44
66       6 8.00 0.09


The 'by' argument allows to sample within any combination of factors desired.

Cheers,

-- 
Seb

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue Jul 25 05:26:54 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 25 Jul 2006 - 06:23:28 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.