Re: [R] Warning: matrix by vector division

From: Marc Schwartz <marc_schwartz_at_comcast.net>
Date: Fri, 07 Mar 2008 16:52:09 -0600

Alexey Shipunov wrote:
> Dear list,
>
> I just made a very simple mistake, but it was hard to spot. And I
> think that I should warn other people, because it is probably so
> simple to make...
>
> === R code ===
>
> # Let us create a matrix:
> (a <- cbind(c(0,1,1), rep(1,3)))
>
> # [,1] [,2]
> # [1,] 0 1
> # [2,] 1 1
> # [3,] 1 1
>
> # That is a MISTAKE:
> a/colSums(a)
>
> # [,1] [,2]
> # [1,] 0.0000000 0.3333333
> # [2,] 0.3333333 0.5000000
> # [3,] 0.5000000 0.3333333
> # I just wonder if some R warning should be issued here?
>
> # That is what I actually needed (column-wise frequencies):
> t(t(a)/colSums(a))
>
> # [,1] [,2]
> # [1,] 0.0 0.3333333
> # [2,] 0.5 0.3333333
> # [3,] 0.5 0.3333333
>
> === end of R code ===
>
> With best wishes and regards,
> Alexey Shipunov

or more simply:

 > prop.table(a, 2)

      [,1] [,2]

[1,]  0.0 0.3333333
[2,]  0.5 0.3333333
[3,]  0.5 0.3333333

See ?prop.table

You need to recognize how matrices are stored in R. They are vectors with a 'dim' attribute. Thus 'a' is essentially:

 > as.vector(a)
[1] 0 1 1 1 1 1

If I take that vector, as say 'x':

x <- as.vector(a)

 > x
[1] 0 1 1 1 1 1

dim(x) <- c(3, 2)

 > x

      [,1] [,2]

[1,]    0    1
[2,]    1    1
[3,]    1    1


When you do the division, since colSums(a) contains two values, they are recycled as required to get the result. Thus colSums(a) effectively becomes:

 > rep(colSums(a), 3)
[1] 2 3 2 3 2 3

so that it is equal in length to 'a'.

The result is then:

 > as.vector(a) / rep(colSums(a), 3)
[1] 0.0000000 0.3333333 0.5000000 0.3333333 0.5000000 0.3333333

which is then returned as a matrix with the original dimensions:

 > matrix(as.vector(a) / rep(colSums(a), 3), 3, 2)

           [,1] [,2]

[1,] 0.0000000 0.3333333
[2,] 0.3333333 0.5000000
[3,] 0.5000000 0.3333333


Another option simply for the sake of example, is:

 > t(apply(a, 1, function(x) x / colSums(a)))

      [,1] [,2]

[1,]  0.0 0.3333333
[2,]  0.5 0.3333333
[3,]  0.5 0.3333333

prop.table() actually uses sweep(), so you might also want to look at that.

So the key to understanding the behavior is to understand how R objects are stored and how recycling rules work.

HTH, Marc Schwartz



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 07 Mar 2008 - 22:55:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 07 Mar 2008 - 23:30:20 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive