From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Thu, 27 Mar 2008 15:12:51 +0000 (GMT)

Date: Thu, 27 Mar 2008 15:12:51 +0000 (GMT)

On Thu, 27 Mar 2008, Ted.Harding_at_manchester.ac.uk wrote:

> Brian Ripley's suggestions of truncated normal and

*> log-normal are of course resources for ensuring that
**> you get positive simulated results.
*

I was answering the question asked!

Wanting to simulate from the fitted mean and sd does *not* mean you want to emulate the data -- indeed, it is often done to see if peculiar aspects of data have a noticeable effect on the rest of the analysis.

I've done this with analytical chemist colleagues several times, although I did point out it was better to use robust summaries.

> However, I think youre real problem (having looked at

*> the numbers you quote) is that you should not be thinking
**> of using a normal distribution, or anything similar, at all.
**>
**> Your variables dat$x and dat$y have:
**> mean(dat$x)
**> ## [1] 0.3126667
**> sd(dat$x)
**> ## [1] 0.3372137
**>
**> mean(dat$y)
**> ## [1] 0.4223137
**> sd(dat$y)
**> ## [1] 0.5120841
**>
**> so in both cases the SD is about the same as the mean, and
**> getting negative values from simulated normal distributions
**> with these means and SDs is inevitable.
**>
**> But now look at the two series of 15 numbers in dat$x and dat$y:
**> The first 12 are of the order of 0.15 and 0.20 respectively,
**> while the final 3 in each case are of the order of 1.0 and 1.5
**> respectively. And this is where your large SD is coming from.
**> Neither series is from a simple normal distribution.
**>
**> mean(dat$x[1:12])
**> ## [1] 0.1519167
**> sd(dat$x[1:12])
**> ## [1] 0.02447432
**> and it is impossible for the last 3 ( 1.032 0.803 1.032)
**> to have come from a normal distribution giving the first 12.
**>
**> mean(dat$y[1:12])
**> ## [1] 0.1807932
**> sd(dat$y[1:12])
**> ## [1] 0.03380083
**> with a similar conclusion; and likewise for the last three
**> 1.551043 1.063100 1.551043
**>
**> Note that, for the first 12 in each case, the SD less that
**> 1/5 of the mean:
**>
**> mean(dat$x[1:12])/sd(dat$x[1:12])
**> ## [1] 6.207186
**> mean(dat$y[1:12])/sd(dat$y[1:12])
**> ## [1] 5.348779
**>
**> so, where the first 12 of x and y are concerned, if you sampled
**> from normal distributions with the same means and SDs you would
**> get a negative number with probability less than
**>
**> pnorm(-5)
**> ## [1] 2.866516e-07
**>
**> What you in fact have here is that the numbers are in two groups,
**> each with a small SD relative to its mean:
**>
**> dat$x dat$y
**> Mean SD Mean SD
**> ------------------------+-------------------
**> 1:12 0.152 0.024 | 0.181 0.034
**> |
**> 13:15 0.956 0.132 | 1.39 0.282
**>
**> Note that for dat$x Mean/SD approx = 6 for each sub-series,
**> and for data$y Mean/SD approx = 5 for each subseries, so
**> you could be looking at results which display a nearly
**> constant coefficient of variation. Now, this is indeed a
**> property of the log-normal distribution (as well as of
**> others), so that could indeed be worth considering.
**> However, you still have to account for the apparent split
**> noted above into distinct groups.
**>
**> So you are really facing a modelling question: why did
**> the numbers come out as they did, and what is a good way
**> to represent that mechanism as a distribution?
**>
**> With best wishes,
**> Ted.
**>
**> On 27-Mar-08 12:27:55, Tom Cohen wrote:
**>>
**>> Dear list,
**>> I have a dataset containing values obtained from two different
**>> instruments (x and y).
**>> I want to generate 5 samples from normal distribution for each
**>> instrument based on
**>> their means and standard deviations. The problem is values from both
**>> instruments are
**>> non-negative, so if using rnorm I would get some negative values. Is
**>> there any options
**>> to determine the lower bound of normal distribution to be 0 or can I
**>> simulate the
**>> samples in different ways to avoid the negative values?
**>>
**>>
**>> > dat
**>> id x y
**>> 75 101 0.134 0.1911315
**>> 79 102 0.170 0.1610306
**>> 76 103 0.134 0.1911315
**>> 84 104 0.170 0.1610306
**>> 74 105 0.134 0.1911315
**>> 80 106 0.170 0.1610306
**>> 77 107 0.134 0.1911315
**>> 81 108 0.170 0.1610306
**>> 82 109 0.170 0.1610306
**>> 78 111 0.170 0.1610306
**>> 83 112 0.170 0.1610306
**>> 85 113 0.097 0.2777778
**>> 2 201 1.032 1.5510434
**>> 1 202 0.803 1.0631001
**>> 5 203 1.032 1.5510434
**>>
**>> mu<-apply(dat[,-1],2,mean)
**>> sigma<-apply(dat[,-1],2,sd)
**>> len<-5
**>> n<-20
**>> s1<-vector("list",len)
**>> set.seed(7)
**>> for(i in 1:len){
**>> s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
**>> y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
**>> }
**>>
**>> Thanks for any help,
**>> Tom
**>>
**>>
**>> ---------------------------------
**>> Sök efter kärleken!
**>>
**>> [[alternative HTML version deleted]]
**>>
**>
**> --------------------------------------------------------------------
**> E-Mail: (Ted Harding) <Ted.Harding_at_manchester.ac.uk>
**> Fax-to-email: +44 (0)870 094 0861
**> Date: 27-Mar-08 Time: 14:00:44
**> ------------------------------ XFMail ------------------------------
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595Received on Thu 27 Mar 2008 - 19:18:30 GMT______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 27 Mar 2008 - 19:31:21 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*