Re: [R] set.seed ( ) function

From: Joshua Wiley <jwiley.psych_at_gmail.com>
Date: Thu, 21 Apr 2011 20:47:15 -0700

On Thu, Apr 21, 2011 at 8:34 PM, Penny Bilton <pennybilton_at_xnet.co.nz> wrote:
> Hi Josh,
>
> Thanks for your reply.
>
> The problem is have is in trying to retain the proportions of 2 groups in my
> data while sampling into training and test sets. I find that different
>  arguments for set.seed give very different proportions of my 2 groups in
> the training and test sets.

Sure, just because numbers are random does not guarantee that equal numbers from both groups will be sampled. Perhaps you are looking for some sort of constrained random sampling like sampling x from group 1 and x from group 2? If so, try calling sample() separately on each group (for help applying the same function to different groups, take a look at ?by or ?tapply for example).

Josh

PS cced back to list

>
>
> Penny.
>
>
>
> On 22/04/2011 3:27 p.m., Joshua Wiley wrote:
>>
>> Hi,
>>
>> On Thu, Apr 21, 2011 at 8:18 PM, Penny Bilton<pennybilton_at_xnet.co.nz>
>>  wrote:
>>>
>>> I am using /set.seed()/   before the /sample/   function.
>>>
>>> How does the length of the argument of /set.seed()/   and order of the
>>> digits affect how the sampling is carried out?
>>
>> You can use set.seed() to specify a particular seed so that while
>> pseudo-random numbers are sampled, you can repeat it.  For example:
>>
>> set.seed(10)
>> rnorm(10)
>> set.seed(10)
>> rnorm(10)
>>
>>> Specifically, I have used set.seed(123456789). Will this configuration
>>> give me a genuinely random sampling??
>>
>> You will never get truly random sampling from a computer algorithm,
>> but it is darn close and more than adequate in the majority of cases.
>> 123456789 is just a length 1 vector containing the number 123456789,
>> not 9 separate numbers.
>>
>> Google will be able to give you a lot of information on pseudo-random
>> number algorithms as well as the concept of "seeds".  Also see
>> ?set.seed
>>
>> Cheers,
>>
>> Josh
>>
>>>
>>> Thank you in anticipation.
>>>
>>> Penny.
>>>
>>>
>>>        [[alternative HTML version deleted]]
>>>
>>> ______________________________________________
>>> R-help_at_r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>>
>
>

-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 22 Apr 2011 - 03:49:04 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 22 Apr 2011 - 08:40:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive