Re: [R] set the lower bound of normal distribution to 0 ?

From: ONKELINX, Thierry <Thierry.ONKELINX_at_inbo.be>
Date: Tue, 01 Apr 2008 14:49:36 +0200

Dear Tom,

In my opinion you should first transform your data to the log-scale and then calculate the mean and st.dev. of the log-transformed data. Because mean(log(x)) is not equal to log(mean(x)).

HTH, Thierry



ir. Thierry Onkelinx
Instituut voor natuur- en bosonderzoek / Research Institute for Nature and Forest Cel biometrie, methodologie en kwaliteitszorg / Section biometrics, methodology and quality assurance Gaverstraat 4
9500 Geraardsbergen
Belgium
tel. + 32 54/436 185
Thierry.Onkelinx_at_inbo.be
www.inbo.be

To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher

The plural of anecdote is not data.
~ Roger Brinner

The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey

-----Oorspronkelijk bericht-----
Van: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] Namens Tom Cohen Verzonden: dinsdag 1 april 2008 14:17
Aan: r-help_at_stat.math.ethz.ch
Onderwerp: [R] set the lower bound of normal distribution to 0 ?

Tom Cohen <tom.cohen78_at_yahoo.se> skrev: Thanks Prof Brian for your suggestion. I should know that for right-skewed data, one should generate the samples from a lognormal.   

My problem is that x and y are two instruments that were thought to be measured the same thing but somehow show a wide confidence interval of the difference between the two intruments.This may be true that these two measure differently but can also due to the small number of observations, so the idea is if I increases the sample size then I may get better precision between the two instrument by generating samples based on the means and standard deviations from x and y.   

I am using 'urlnorm' which allows sampling from truncated distribution since I want the samples to take values from 0 to the max(x) respectively max(y). I am unsure how to specify the means and standard deviations in 'urlnorm'. Based on x- and y-values I have standard deviations sd_x=0.3372137, sd_y=0.5120841 and the means mean_x=0.3126667 mean_y=0.4223137 which are not on log scale as required in urlnorm.   

To covert sd_x, sd_y and mean_x, mean_y on a log-scale I did sd_logx=sqrt(log(1.3372137))=0.54, sd_logy=sqrt(log(1.5120841))=0.64, mean_logx=-(0.54^2)/2 and mean_logy=-(0.64^2)/2. Can anyone tell if these are correctly calculated? Are these the values to be specified in urlnorm? Do the lower respectively upper bound have to be on the log-scale as well or which scale?    

   set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=urlnorm(n*i,meanlog=mean_logx,sdlog=sd_logx, lb=0, ub=max(x)),
> y=urlnorm(n*i,meanlog=mean_logy,sdlog=sd_logy, lb=0, ub=max(y)))
> }
   

  Thanks again for any suggetions.

Prof Brian Ripley <ripley_at_stats.ox.ac.uk> skrev:   On Thu, 27 Mar 2008, Tom Cohen wrote:

>
> Dear list,

> I have a dataset containing values obtained from two different
> instruments (x and y). I want to generate 5 samples from normal
> distribution for each instrument based on their means and standard
> deviations. The problem is values from both instruments are
> non-negative, so if using rnorm I would get some negative values. Is
> there any options to determine the lower bound of normal distribution to
> be 0 or can I simulate the samples in different ways to avoid the
> negative values?

Well, that would not be a normal distribution.

If you want a _truncated_ normal distribution it is very easy by inversion. E.g.

trunc_rnorm <- function(n, mean = 0, sd = 1, lb = 0) {
lb <- pnorm(lb, mean, sd)
qnorm(runif(n, lb, 1), mean, sd)
}

but I suggest you may rather want samples from a lognormal.

>
>
> > dat
> id x y
> 75 101 0.134 0.1911315
> 79 102 0.170 0.1610306
> 76 103 0.134 0.1911315
> 84 104 0.170 0.1610306
> 74 105 0.134 0.1911315
> 80 106 0.170 0.1610306
> 77 107 0.134 0.1911315
> 81 108 0.170 0.1610306
> 82 109 0.170 0.1610306
> 78 111 0.170 0.1610306
> 83 112 0.170 0.1610306
> 85 113 0.097 0.2777778
> 2 201 1.032 1.5510434
> 1 202 0.803 1.0631001
> 5 203 1.032 1.5510434
>
> mu<-apply(dat[,-1],2,mean)
> sigma<-apply(dat[,-1],2,sd)
> len<-5
> n<-20
> s1<-vector("list",len)
> set.seed(7)
> for(i in 1:len){
> s1[[i]]<-cbind.data.frame(x=rnorm(n*i,mean=mu[1],sd=sigma[1]),
> y=rnorm(n*i,mean=mu[2],sd=sigma[2]))
> }
>
> Thanks for any help,
> Tom
>
>
> ---------------------------------
> S?? efter k??leken!
>
> [[alternative HTML version deleted]]
>
>

-- 
Brian D. Ripley, ripley_at_stats.ox.ac.uk
Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel: +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UK Fax: +44 1865 272595

    
---------------------------------
  Går det långsamt? Skaffa dig en snabbare bredbandsuppkoppling.


       
---------------------------------
Låna pengar utan säkerhet.

	[[alternative HTML version deleted]]

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 01 Apr 2008 - 12:50:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 01 Apr 2008 - 13:30:25 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive