Re: [R] Abundance data ordination in R

From: Jari Oksanen <jari.oksanen_at_oulu.fi>
Date: Mon 02 Apr 2007 - 09:48:37 GMT

Milton Cezar Ribeiro <milton_ruser <at> yahoo.com.br> writes:

>
> Dear R-gurus
>
> I have a data.frame with abundance data for species and sites which looks like:
> mydf<-data.frame(
> sp1=sample(0:10,5,replace=T),
> sp2=sample(0:20,5,replace=T),
> sp3=sample(0:4,5,replace=T),
> sp4=sample(0:2,5,replace=T))
> rownames(mydf)<-paste("sites",1:5,sep="")
>
> I would like make an ordination analysis of these data and my worries is about
the "zeros" (absence of
> species) into the matrix. Up to I read (Gotelli - A primir of ecological
statistics, 2004), when I have
> abundance data I canĀ“t compute Euclidian Distances because the zeros have the
meaning of absence of the
> species and not as zero counting. Gotelli suggests one make "principal
coordinates analysis". I would
> like to here from you what you think about and what is the best packages and
functions to I compute my
> distance matrices and do my ordination analysis. Can I considere zero as NA on
my data.frame? Is there a
> good PDF book available about Multivariate Analysis for abundance data
available on the web?
>
>

Other people already suggested what to do with these data and where to find pdf texts. I only comment on some points raised in this original question. Firstly, Euclidean distance is quite OK with zeros, or at least as good as any other normal dissimilarity index is with zeros. Euclidean distance on non-transformed data is poor for other reasons (it takes squared differences emphasizing abundance, and even when two sites have nothing in common, Euclidean distance varies with total abundances). Using Principal Co-ordinates analysis does not change this, since it also can be run with Euclidean distances. However, there are a many packages providing "better" dissimilarity indices or transformations that make Euclidean distances more useful (such as the Hellinger transformation).

Another question is more abstract: indeed, you may regard most zeros as missing data. Species probably could occur in your sample site, more or less, but it was too scarce to be observed. How to do this in practice is the tricky issue. You cannot simply change zeros to NA, since then the dissimilarities (if they don't fail) will really give a special significance to these cells. Regarding them as zeros certaily makes more sense than removing *pairs* of data where species is NA in one site and present in another. There are ways to have something like handling zeros as missing values of various degrees(!), but my decency prohibits me to write about these methods.

cheers, jari oksanen



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Mon Apr 02 19:59:50 2007

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Mon 02 Apr 2007 - 11:30:34 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.