Re: [R] options in 'rnorm' to set the lower bound of normal dist

From: Ted Harding <Ted.Harding_at_manchester.ac.uk>
Date: Thu, 27 Mar 2008 14:00:48 +0000 (GMT)


Brian Ripley's suggestions of truncated normal and log-normal are of course resources for ensuring that you get positive simulated results.

However, I think youre real problem (having looked at the numbers you quote) is that you should not be thinking of using a normal distribution, or anything similar, at all.

Your variables dat$x and dat$y have:

   mean(dat$x)
## [1] 0.3126667

   sd(dat$x)
## [1] 0.3372137

   mean(dat$y)
## [1] 0.4223137

   sd(dat$y)
## [1] 0.5120841

so in both cases the SD is about the same as the mean, and getting negative values from simulated normal distributions with these means and SDs is inevitable.

But now look at the two series of 15 numbers in dat$x and dat$y: The first 12 are of the order of 0.15 and 0.20 respectively, while the final 3 in each case are of the order of 1.0 and 1.5 respectively. And this is where your large SD is coming from. Neither series is from a simple normal distribution.

   mean(dat$x[1:12])
## [1] 0.1519167

   sd(dat$x[1:12])
## [1] 0.02447432
and it is impossible for the last 3 ( 1.032 0.803 1.032) to have come from a normal distribution giving the first 12.

   mean(dat$y[1:12])
## [1] 0.1807932

   sd(dat$y[1:12])
## [1] 0.03380083
with a similar conclusion; and likewise for the last three 1.551043 1.063100 1.551043

Note that, for the first 12 in each case, the SD less that 1/5 of the mean:

   mean(dat$x[1:12])/sd(dat$x[1:12])
## [1] 6.207186

   mean(dat$y[1:12])/sd(dat$y[1:12])
## [1] 5.348779

so, where the first 12 of x and y are concerned, if you sampled from normal distributions with the same means and SDs you would get a negative number with probability less than

   pnorm(-5)
## [1] 2.866516e-07

What you in fact have here is that the numbers are in two groups, each with a small SD relative to its mean:

             dat$x             dat$y
          Mean     SD       Mean     SD
------------------------+-------------------
 1:12    0.152   0.024  |  0.181   0.034
                        |

13:15 0.956 0.132 | 1.39 0.282

Note that for dat$x Mean/SD approx = 6 for each sub-series, and for data$y Mean/SD approx = 5 for each subseries, so you could be looking at results which display a nearly constant coefficient of variation. Now, this is indeed a property of the log-normal distribution (as well as of others), so that could indeed be worth considering. However, you still have to account for the apparent split noted above into distinct groups.

So you are really facing a modelling question: why did the numbers come out as they did, and what is a good way to represent that mechanism as a distribution?

With best wishes,
Ted.

On 27-Mar-08 12:27:55, Tom Cohen wrote:
>
> Dear list,
> I have a dataset containing values obtained from two different
> instruments (x and y).
> I want to generate 5 samples from normal distribution for each
> instrument based on
> their means and standard deviations. The problem is values from both
> instruments are
> non-negative, so if using rnorm I would get some negative values. Is
> there any options
> to determine the lower bound of normal distribution to be 0 or can I
> simulate the
> samples in different ways to avoid the negative values?
>
>
> > dat
> id x y
> 75 101 0.134 0.1911315
> 79 102 0.170 0.1610306
> 76 103 0.134 0.1911315
> 84 104 0.170 0.1610306
> 74 105 0.134 0.1911315
> 80 106 0.170 0.1610306
> 77 107 0.134 0.1911315
> 81 108 0.170 0.1610306
> 82 109 0.170 0.1610306
> 78 111 0.170 0.1610306
> 83 112 0.170 0.1610306
> 85 113 0.097 0.2777778
> 2 201 1.032 1.5510434
> 1 202 0.803 1.0631001
> 5 203 1.032 1.5510434
>
> mu<-apply(dat[,-1],2,mean)
> sigma<-apply(dat[,-1],2,sd)
> len<-5
> n<-20
> s1<-vector("list",len)
> set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
> y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
> }
>
> Thanks for any help,
> Tom
>
>
> ---------------------------------
> Sök efter kärleken!
>
> [[alternative HTML version deleted]]
>



E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk> Fax-to-email: +44 (0)870 094 0861
Date: 27-Mar-08                                       Time: 14:00:44
------------------------------ XFMail ------------------------------

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu 27 Mar 2008 - 18:10:05 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 27 Mar 2008 - 19:31:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive