# Re: [R] normality test

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Sat 30 Apr 2005 - 01:05:40 EST

On 28-Apr-05 Pieter Provoost wrote:
> Thanks all for your comments and hints. I will try to
> keep them in mind.
> Since a number of people asked me what I'm trying to do:
> I want to apply Bayesian inference to a simple ecological
> model I wrote, and therefore I need to fit (uniform, normal
> or lognormal) distributions to sets of observed data
> (to derive mean and sd). You probably have noticed that I'm
> quite new to statistics, but I'm working on that...
>
> Pieter

And please continue to do so!

Let me try to be constructive. It is clearly established that the data you posted are far from Normally distributed. The simple qqnorm plot shows that immediately, and if you need it the shapiro.test() with "p-value = 8.499e-11" settles it!

Going, however, a bit further, and looking at qqnorm(log(X)) (X being what I call your data series) suggests that it departs systematically from a pure logNormal at least at the 6 highest values of X. And again, shapiro,test(log(X)) gives

p-value = 0.00965

which is again a fairly strong indication.

Now, going back to your statement above, that you wrote a "simple ecological model", I would like to know more about that before proceeding further.

The rather clear break in slope in qqnorm(log(X)) suggests to me the possibility that your data may represent a mixture of two distinct, possibly though not necessarily logNormal, distributions, one having a much longer upper tail than the other but being a relative small proportion (say 1/3).

For example, with X denoting your data, compare

qqnorm(log(X))

with

set.seed(52341);Y1<-exp(rnorm(22,-3.26,0.69));   Y2<-exp(rnorm(10,-1.75,2.35))
qqnorm(log(c(Y1,Y2)))

They are not dissimilar (and I have not been trying very hard).

Another thing to look at is simply

hist(log(X),breaks=0.5*(-12:4)

This also shows some interesting features: the very high peak between -3.0 and -2.5 (and possibly an unduly high value between -3.5 and -3.0), together with a rather thin and widely spread upper tail above -2.0.

This could be quite consistent with the kind of mixture described above, or could be due to observer error/bias in measurement.

In any case, it is clear that there is more than a simple "(uniform, normal or lognormal)" distribution at play here.

In a real investigation, I would at this stage be concerned to develop a realistic model of how the data are generated.

You do not say what these data represent.

Ths above was mostly written before you posted your second email, explaining that

"The Bayesian methods I (will) use are implemented in the    modelling environment I'm using (FEMME). I'm supervised    by the person that developed the environment, and she    asked me to fit a normal or lognormal distribution to    the observed data. The parameters of that distribution    will then be used for the Bayesian analysis. So I suppose    my supervisor knows what very well what she's doing, even    though I don't (well... not yet)."

It may be speculated whether your supervisor has herself seriously questioned the structure of these data, since what she is asking you to do seems to presume that the above is not relevant!

However, a mixture model would fit nicely into a Bayesian framework, since (from the above) I suspect a simulation or MCMC procedure will depend on the parameters to be estimated for the distribution. For the mixture (e.g. log(X) is a mixture of two normal distrbutions), you can estimate the two parameters for each normal distribution and the proportions p:(1-p) of each. Then, in sampling from the mixture you first decide on component 1 with probability p or component 2 with probability q = (1-p), then sample from the corresonding lognormal distribution.

Best wishes,
Ted.

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
```Date: 29-Apr-05                                       Time: 15:41:36
------------------------------ XFMail ------------------------------

______________________________________________
```
R-help@stat.math.ethz.ch mailing list