Re: [R] Comparing multiple distributions

From: jiho <>
Date: Thu, 31 May 2007 19:28:44 +0200

On 2007-May-31 , at 18:56 , Bert Gunter wrote:
> While Ravi's suggestion of the "compositions" package is certainly
> appropriate, I suspect that the complex and extensive statistical
> "homework"
> you would need to do to use it might be overwhelming (the geometry of
> compositions is a simplex, and this makes things hard).

Yes I am reading the documentation now, which is well written but huge indeed...

> As a simple and
> perhaps useful alternative, use pairs() or splom() to plot your 5-D
> data,
> distinguishing the different treatments via color and/or symbol.
> In addition, it might be useful to do the same sort of plot on the
> first two
> principal components (?prcomp) of the first 4 dimensions of your 5
> component
> vectors (since the 5th is determined by the first 4). Because of the
> simplicial geometry, this PCA approach is not right, but it may
> nevertheless
> be revealing. The same plotting ideas are in the compositions
> package done
> properly (in the correct geometry),so if you are motivated to do
> so, you can
> do these things there. Even if you don't dig into the details,
> using the
> compositions package version of the plots may be realtively easy to
> do,interpretable, and revealing -- more so than my "simple but wrong"
> suggestions. You can decide.
> I would not trust inference using ad hoc approaches in the
> untransformed
> data. That's what the package is for. But plotting the data should
> always be
> at least the first thing you do anyway. I often find it to be
> sufficient,
> too.

Thank you for your suggestions on plotting, I will look into it. I was using histograms of mean proportions + SE until now because it was what seemed the most straightforward given my specific questions. If we come back to my original data (abandoning the statistical language for a while ;) ) I have proportions of fishes caught 1. near the surface, 2. a bit below, .... 5. near the bottom. The questions I want to ask are for example: does the vertical distribution of species A and species B differ? So I can plot the mean proportion at each depth for both species and obtain a visual representation of the vertical distribution of each.
At this stage differences between fishes that accumulate near the surface or near the bottom are quite obvious. If I add error bars I can get an idea of the variability of those distributions. The issue arise when I want to *test* for a difference between the distributions of species A and B. If I use a basic KS test I can only compare the mean proportions for species A (5 points) to the mean proportions of species B (5 points) and this has low power + does not take in account the variability around those means. In addition I may also want to know wether there is a difference within species A, B and C and pairwise KS tests would increase alpha error risk. Am I explaining things correctly? Does this seem logical to you too? As for the PCA I must admit I don't really understand what you mean.

Thank you very much again.

> -----Original Message-----
> From:
> [] On Behalf Of jiho
> Subject: Re: [R] Comparing multiple distributions
> Nobody answered my first request. I am sorry if I did not explain my
> problem clearly. English is not my native language and statistical
> english is even more difficult. I'll try to summarize my issue in
> more appropriate statistical terms:
> Each of my observations is not a single number but a vector of 5
> proportions (which add up to 1 for each observation). I want to
> compare the "shape" of those vectors between two treatments (i.e. how
> the quantities are distributed between the 5 values in treatment A
> with respect to treatment B).
> I was pointed to Hotelling T-squared. Does it seem appropriate? Are
> there other possibilities (I read many discussions about hotelling
> vs. manova but I could not see how any of those related to my
> particular case)?
> Thank you very much in advance for your insights. See below for my
> earlier, more detailed, e-mail.
> On 2007-May-21 , at 19:26 , jiho wrote:
>> I am studying the vertical distribution of plankton and want to
>> study its variations relatively to several factors (time of day,
>> species, water column structure etc.). So my data is special in
>> that, at each sampling site (each observation), I don't have *one*
>> number, I have *several* numbers (abundance of organisms in each
>> depth bin, I sample 5 depth bins) which describe a vertical
>> distribution.
>> Then let say I want to compare speciesA with speciesB, I would end
>> up trying to compare a group of several distributions with another
>> group of several distributions (where a "distribution" is a vector
>> of 5 numbers: an abundance for each depth bin). Does anyone know
>> how I could do this (with R obviously ;) )?
>> Currently I kind of get around the problem and:
>> - compute mean abundance per depth bin within each group and
>> compare the two mean distributions with a ks.test but this
>> obviously diminishes the power of the test (I only compare 5*2
>> "observations")
>> - restrict the information at each sampling site to the mean depth
>> weighted by the abundance of the species of interest. This way I
>> have one observation per station but I reduce the information to
>> the mean depths while the actual repartition is important also.
>> I know this is probably not directly R related but I have already
>> searched around for solutions and solicited my local statistics
>> expert... to no avail. So I hope that the stats' experts on this
>> list will help me.
>> Thank you very much in advance.



Ce message a été vérifié par MailScanner
pour des virus ou des polluriels et rien de
suspect n'a été trouvé.

______________________________________________ mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code.

Received on Thu 31 May 2007 - 18:09:07 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 31 May 2007 - 18:31:12 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.