# Re: [R] exploring dist()

From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
Date: Sun, 20 Mar 2011 19:43:47 +0000

On Fri, 2011-03-18 at 06:21 -0700, bra86 wrote:
> Hello, everybody,
>
> I hope somebody could help me with a dist() function.
> I have a data frame of size 2*4087 (col*row), where col corresponds to the
> treatment and rows are

So you have 4087 species? If yes, normally, you'd have the species in the columns and the samples/treatments in the row.

This doesn't make sense - distances would mean you have a square symmetric matrix but 2 * 4087 isn't square. Do you mean you have Hellinger **transformed** the data such that when you take the Euclidean distances of this transformed data you get the Hellinger distance rather than the Euclidean distance?

If yes - and you sort the rows/columns issue - R wants the samples in rows - then it is reasonably simple.

Here is a much simplified example with 5 species and 4 samples:

```dat <- data.frame(runif(4, 1, 10), runif(4, 2, 10), runif(4, 4, 20),
runif(4, 1, 4), runif(4, 0, 5))
names(dat) <- paste("spp", LETTERS[1:5])
```
rownames(dat) <- paste("samp", 1:4)

So we have data that looks like this:

> dat

spp A spp B spp C spp D spp E

```samp 1 6.974237 7.933403  5.460453 3.975219 4.6818142
samp 2 1.049801 6.751013 14.143798 1.777532 4.0261914
samp 3 5.742314 2.243850 15.613524 3.476935 0.4144043
samp 4 5.985012 9.576440  8.722579 3.411262 1.8126338

```

Then I apply a Hellinger transformation:

require(vegan)
datH <- decostand(dat, method = "hellinger")

So at this point we have something that I think you are telling us you have:

> datH

spp A spp B spp C spp D spp E

```samp 1 0.4901864 0.5228086 0.4337378 0.3700782 0.4016244
samp 2 0.1945069 0.4932488 0.7139447 0.2530989 0.3809156
samp 3 0.4570334 0.2856942 0.7536245 0.3556336 0.1227769
samp 4 0.4503635 0.5696823 0.5436922 0.3400073 0.2478481

```

We can use dist() on this data frame via:

dij <- dist(datH)

If we look at the object created, we see the **printed** representation of the dissimilarity matrix, which is a 4*4 matrix in this example:

> dij

```          samp 1    samp 2    samp 3
samp 2 0.4253576
samp 3 0.4874570 0.4367179
```

samp 4 0.2010581 0.3543312 0.3750363

Note that the diagonal and the upper triangle of the matrix are not printed, or stored even, because they are trivial (0 for all diagonals and the upper triangle is the same as the lower triangle).

dist() actually creates a vector of numbers that will fill the lower triangle of the dissimilarity matrix. This saves on storage space. If you want the add the diagonal and upper triangle, we can get it one of two ways:

1. dist(datH, diag = TRUE, upper = TRUE)
2. as.matrix(dij)

However only the second actually returns a matrix with 16 numbers, the former still only computes the 6 pair-wise distances, but when **printed** it shows the full matrix.

If you really have species in rows and smaples in columns, then you can transpose your matrix, e.g.

datH.t <- t(dat.H)

and then compute the dissimilarity matrix as above.

Does this help?

G

> with a dist() function. I know that "euclidean" method should be used.
>
> When I type:
> dist(dframe,"euclidean")
> it gives me a truncated table, where values are missing.
>
> I suppose that I have to define something for the values,
> but I have no idea what exactly, because I am not familiar with r at all.
>
> I would be very appreciated for every kind of suggestions or tips.
>
>
> --
> View this message in context: http://r.789695.n4.nabble.com/exploring-dist-tp3387187p3387187.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> and provide commented, minimal, self-contained, reproducible code.

```--
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help