From: jiho <jo.irisson_at_gmail.com>

Date: Thu, 31 May 2007 19:28:44 +0200

Date: Thu, 31 May 2007 19:28:44 +0200

On 2007-May-31 , at 18:56 , Bert Gunter wrote:

> While Ravi's suggestion of the "compositions" package is certainly

*> appropriate, I suspect that the complex and extensive statistical
**> "homework"
**> you would need to do to use it might be overwhelming (the geometry of
**> compositions is a simplex, and this makes things hard).
*

Yes I am reading the documentation now, which is well written but huge indeed...

> As a simple and

*> perhaps useful alternative, use pairs() or splom() to plot your 5-D
**> data,
**> distinguishing the different treatments via color and/or symbol.
**>
**> In addition, it might be useful to do the same sort of plot on the
**> first two
**> principal components (?prcomp) of the first 4 dimensions of your 5
**> component
**> vectors (since the 5th is determined by the first 4). Because of the
**> simplicial geometry, this PCA approach is not right, but it may
**> nevertheless
**> be revealing. The same plotting ideas are in the compositions
**> package done
**> properly (in the correct geometry),so if you are motivated to do
**> so, you can
**> do these things there. Even if you don't dig into the details,
**> using the
**> compositions package version of the plots may be realtively easy to
**> do,interpretable, and revealing -- more so than my "simple but wrong"
**> suggestions. You can decide.
**>
**> I would not trust inference using ad hoc approaches in the
**> untransformed
**> data. That's what the package is for. But plotting the data should
**> always be
**> at least the first thing you do anyway. I often find it to be
**> sufficient,
**> too.
*

Thank you for your suggestions on plotting, I will look into it. I
was using histograms of mean proportions + SE until now because it
was what seemed the most straightforward given my specific questions.
If we come back to my original data (abandoning the statistical
language for a while ;) ) I have proportions of fishes caught 1. near
the surface, 2. a bit below, .... 5. near the bottom. The questions I
want to ask are for example: does the vertical distribution of
species A and species B differ? So I can plot the mean proportion at
each depth for both species and obtain a visual representation of the
vertical distribution of each.

At this stage differences between fishes that accumulate near the
surface or near the bottom are quite obvious. If I add error bars I
can get an idea of the variability of those distributions. The issue
arise when I want to *test* for a difference between the
distributions of species A and B. If I use a basic KS test I can only
compare the mean proportions for species A (5 points) to the mean
proportions of species B (5 points) and this has low power + does not
take in account the variability around those means. In addition I may
also want to know wether there is a difference within species A, B
and C and pairwise KS tests would increase alpha error risk. Am I
explaining things correctly? Does this seem logical to you too?
As for the PCA I must admit I don't really understand what you mean.

Thank you very much again.

> -----Original Message-----

*> From: r-help-bounces_at_stat.math.ethz.ch
**> [mailto:r-help-bounces_at_stat.math.ethz.ch] On Behalf Of jiho
**> Subject: Re: [R] Comparing multiple distributions
**>
**> Nobody answered my first request. I am sorry if I did not explain my
**> problem clearly. English is not my native language and statistical
**> english is even more difficult. I'll try to summarize my issue in
**> more appropriate statistical terms:
**>
**> Each of my observations is not a single number but a vector of 5
**> proportions (which add up to 1 for each observation). I want to
**> compare the "shape" of those vectors between two treatments (i.e. how
**> the quantities are distributed between the 5 values in treatment A
**> with respect to treatment B).
**>
**> I was pointed to Hotelling T-squared. Does it seem appropriate? Are
**> there other possibilities (I read many discussions about hotelling
**> vs. manova but I could not see how any of those related to my
**> particular case)?
**>
**> Thank you very much in advance for your insights. See below for my
**> earlier, more detailed, e-mail.
**>
**> On 2007-May-21 , at 19:26 , jiho wrote:
**>> I am studying the vertical distribution of plankton and want to
**>> study its variations relatively to several factors (time of day,
**>> species, water column structure etc.). So my data is special in
**>> that, at each sampling site (each observation), I don't have *one*
**>> number, I have *several* numbers (abundance of organisms in each
**>> depth bin, I sample 5 depth bins) which describe a vertical
**>> distribution.
**>>
**>> Then let say I want to compare speciesA with speciesB, I would end
**>> up trying to compare a group of several distributions with another
**>> group of several distributions (where a "distribution" is a vector
**>> of 5 numbers: an abundance for each depth bin). Does anyone know
**>> how I could do this (with R obviously ;) )?
**>>
**>> Currently I kind of get around the problem and:
**>> - compute mean abundance per depth bin within each group and
**>> compare the two mean distributions with a ks.test but this
**>> obviously diminishes the power of the test (I only compare 5*2
**>> "observations")
**>> - restrict the information at each sampling site to the mean depth
**>> weighted by the abundance of the species of interest. This way I
**>> have one observation per station but I reduce the information to
**>> the mean depths while the actual repartition is important also.
**>>
**>> I know this is probably not directly R related but I have already
**>> searched around for solutions and solicited my local statistics
**>> expert... to no avail. So I hope that the stats' experts on this
**>> list will help me.
**>>
**>> Thank you very much in advance.
*

JiHO

--- http://jo.irisson.free.fr/ -- Ce message a été vérifié par MailScanner pour des virus ou des polluriels et rien de suspect n'a été trouvé. CRI UPVD http://www.univ-perp.frReceived on Thu 31 May 2007 - 18:09:07 GMT______________________________________________ R-help_at_stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 31 May 2007 - 18:31:12 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*