Re: [R] extracting a percentage of data by random

From: <Bill.Venables_at_csiro.au>
Date: Thu, 06 Mar 2008 11:20:16 +1000

You don't need any explicit loops at all. Here is a demo of one way to do it:

> set.seed(23) # on Windows
> dat <- data.frame(age = factor(sample(1:4, 200, rep = T)), y =
runif(200))
> head(dat) # ages are in random order
  age y

1   3 0.64275524
2   1 0.56125314
3   2 0.82418228
4   3 0.97050933
5   4 0.02827508
6   2 0.72291636

> with(dat, table(age)) # how many in each age group
age
 1 2 3 4
37 55 44 64
> ind <- lapply(split(1:nrow(dat), dat$age),

              function(x) sample(x, round(length(x)/10))) # the trick
> ind

$`1`
[1] 135 2 188 133

$`2`
[1] 124 33 140 162 25 13

$`3`
[1] 115 79 27 44

$`4`
[1] 58 129 84 198 72 109

> sample_dat <- dat[sort(unlist(ind)), ] # with indices, select data
> sample_dat

    age         y
2     1 0.5612531
13    2 0.7339141
25    2 0.9548750
27    3 0.7419931
33    2 0.6965722
44    3 0.5363812
58    4 0.5464051
72    4 0.2785669
79    3 0.6453164
84    4 0.1203811
109   4 0.9154706
115   3 0.2118767
124   2 0.3056171
129   4 0.7635097
133   1 0.6474702
135   1 0.2466226
140   2 0.6292326
162   2 0.5338671

188 1 0.9882631
198 4 0.1983350
>

Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA

Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:                         +61 4 8819 4402
Home Phone:                     +61 7 3286 7700
mailto:Bill.Venables_at_csiro.au
http://www.cmis.csiro.au/bill.venables/

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of Chang Liu
Sent: Thursday, 6 March 2008 10:50 AM
To: r-help_at_r-project.org
Subject: [R] extracting a percentage of data by random

Hello Gurus:  

If I have a dataframe with one of the variables called "age" for example, and I want to extract a random 10% of the observations from each "age" group of the entire data frame. Do I have to double loop to split the data and then loop again to assign random numbers? Or is there a better way to do this?  

Thanks!
Karen      


        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 06 Mar 2008 - 01:23:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Mar 2008 - 02:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive