# RE: [R] Stats Question: Single data item versus Sample from Norm

From: Ted Harding <Ted.Harding_at_nessie.mcc.ac.uk>
Date: Tue 05 Apr 2005 - 20:35:01 EST

On 05-Apr-05 Ross Clement wrote:
> Hi. I have a question that I have asked in other stat forums
> but do not yet have an answer for. I would like to know if
> there is some way in R or otherwise of performing the following
> hypothesis test.
>
> I have a single data item x. The null hypothesis is that x
> was selected from a normal distribution N(mu,sigma). The
> alternate hypothesis is that x does not come from this
> distribution.
>
> However, I do not know the values of mu and sigma. I have a
> sample of size N from which I can estimate mu and sigma.
> So, say that I have N(m,s,N), and x. I would like to say with
> some certainty (e.g. 95%) that I can, or can't reject the
> hypothesis that x came from N(mu,sigma). I would also like a
> power test to say how large N should be given the degree of
> accuracy I need when accepting or rejecting individual x
> values.
>
> What is the name of the hypothesis test I need for this?
> Is it built into R, or are there packages I could use?

There is no name because there is no unique test.

The difficulty lies in your statement of alternative hypothesis: "that x does not come from this distribution."

This allows any distribution whatever to be a possible source of your single observation x. Therefore, whatever the value of x, you can reject the null hypothesis that it comes from any N(mu,sigma^2) that is remotely compatible with your N data, in favour of some distribution that happens to predict with near-certainty that you will get that particular observation x.

On that basis, for instance, suppose you had m=1.1 and s=2.5 say. And suppose x=1.15 which is very close to m with a difference which is much smaller than s. You are still entitled to reject H0 on the basis that your alternative allows you to postulate N(1.15,0.00000001) as the source of the observation x.

What you need to do is to make clear what feature of the value of x, in relation to any given Normal distribution, would constitute an indication that it was not sampled from that distribution.

If (as I surmise) this is simply "distance from mu" [the true mean of the Normal distribution], so that you are basically testing whether x is an "outlier", then you could use the simple fact that the distribution of

((x - m)(N/(N+1))^0.5)/s

has a t distribution with (N-1) degrees of freedom.

This, if you have to give it a name, would be a "t" test since that is all it depends on.

Note, however, that this pre-supposes that the variance of the distribution from which x was sampled is the same as the variance of the distribution giving your N values, and also that both distributions are Normal, differing therefore only in their means. So this is a tight restriction of your original universal class of alternatives.

E-Mail: (Ted Harding) <Ted.Harding@nessie.mcc.ac.uk> Fax-to-email: +44 (0)870 094 0861
```Date: 05-Apr-05                                       Time: 11:35:01
------------------------------ XFMail ------------------------------

______________________________________________
```
R-help@stat.math.ethz.ch mailing list