From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>

Date: Sun, 20 Mar 2011 19:43:47 +0000

samp 4 0.2010581 0.3543312 0.3750363

Date: Sun, 20 Mar 2011 19:43:47 +0000

On Fri, 2011-03-18 at 06:21 -0700, bra86 wrote:

> Hello, everybody,

*>
**> I hope somebody could help me with a dist() function.
**> I have a data frame of size 2*4087 (col*row), where col corresponds to the
**> treatment and rows are
*

So you have 4087 species? If yes, normally, you'd have the species in the columns and the samples/treatments in the row.

> species, values are Hellinger distances, I should reconstruct a distance

*> matrix
*

This doesn't make sense - distances would mean you have a square symmetric matrix but 2 * 4087 isn't square. Do you mean you have Hellinger **transformed** the data such that when you take the Euclidean distances of this transformed data you get the Hellinger distance rather than the Euclidean distance?

If yes - and you sort the rows/columns issue - R wants the samples in rows - then it is reasonably simple.

Here is a much simplified example with 5 species and 4 samples:

dat <- data.frame(runif(4, 1, 10), runif(4, 2, 10), runif(4, 4, 20), runif(4, 1, 4), runif(4, 0, 5)) names(dat) <- paste("spp", LETTERS[1:5])rownames(dat) <- paste("samp", 1:4)

So we have data that looks like this:

*> dat
*

spp A spp B spp C spp D spp E

samp 1 6.974237 7.933403 5.460453 3.975219 4.6818142 samp 2 1.049801 6.751013 14.143798 1.777532 4.0261914 samp 3 5.742314 2.243850 15.613524 3.476935 0.4144043 samp 4 5.985012 9.576440 8.722579 3.411262 1.8126338

Then I apply a Hellinger transformation:

require(vegan)

datH <- decostand(dat, method = "hellinger")

So at this point we have something that I think you are telling us you have:

*> datH
*

spp A spp B spp C spp D spp E

samp 1 0.4901864 0.5228086 0.4337378 0.3700782 0.4016244 samp 2 0.1945069 0.4932488 0.7139447 0.2530989 0.3809156 samp 3 0.4570334 0.2856942 0.7536245 0.3556336 0.1227769 samp 4 0.4503635 0.5696823 0.5436922 0.3400073 0.2478481

We can use dist() on this data frame via:

dij <- dist(datH)

If we look at the object created, we see the **printed** representation of the dissimilarity matrix, which is a 4*4 matrix in this example:

*> dij
*

samp 1 samp 2 samp 3 samp 2 0.4253576 samp 3 0.4874570 0.4367179

samp 4 0.2010581 0.3543312 0.3750363

Note that the diagonal and the upper triangle of the matrix are not printed, or stored even, because they are trivial (0 for all diagonals and the upper triangle is the same as the lower triangle).

dist() actually creates a vector of numbers that will fill the lower triangle of the dissimilarity matrix. This saves on storage space. If you want the add the diagonal and upper triangle, we can get it one of two ways:

- dist(datH, diag = TRUE, upper = TRUE)
- as.matrix(dij)

However only the second actually returns a matrix with 16 numbers, the former still only computes the 6 pair-wise distances, but when **printed** it shows the full matrix.

If you really have species in rows and smaples in columns, then you can transpose your matrix, e.g.

datH.t <- t(dat.H)

and then compute the dissimilarity matrix as above.

Does this help?

G

> with a dist() function. I know that "euclidean" method should be used.

*>
**> When I type:
**> dist(dframe,"euclidean")
**> it gives me a truncated table, where values are missing.
**>
**> I suppose that I have to define something for the values,
**> but I have no idea what exactly, because I am not familiar with r at all.
**>
**> I would be very appreciated for every kind of suggestions or tips.
**>
**>
**> --
**> View this message in context: http://r.789695.n4.nabble.com/exploring-dist-tp3387187p3387187.html
**> Sent from the R help mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Sun 20 Mar 2011 - 19:48:56 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Mon 21 Mar 2011 - 11:50:23 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*