From: Douglas Bates <bates_at_stat.wisc.edu>

Date: Thu 20 Apr 2006 - 18:16:16 GMT

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Apr 21 04:18:34 2006

Date: Thu 20 Apr 2006 - 18:16:16 GMT

The documentation for gsummary describes the argument FUN as

FUN: an optional summary function or a list of summary functions to be applied to each variable in the frame. The function or functions are applied only to variables in 'object' that vary within the groups defined by 'groups'. Invariant variables are always summarized by group using the unique value that they assume within that group. If 'FUN' is a single function it will be applied to each non-invariant variable by group to produce the summary for that variable. If 'FUN' is a list of functions, the names in the list should designate classes of variables in the frame such as 'ordered', 'factor', or 'numeric'. The indicated function will be applied to any non-invariant variables of that class. The default functions to be used are 'mean' for numeric factors, and 'Mode' for both 'factor' and 'ordered'. The 'Mode' function, defined internally in 'gsummary', returns the modal or most popular value of the variable. It is different from the 'mode' function that returns the S-language mode of the variable.

so the behavior you noticed is documented.

The "summary" in "gsummary" is used in the sense of a representative value, not in the more general sense of a numerical summary of any sort. If the values do not vary within a group then the common value within the group is, according to our definition, the representative value.

On 4/19/06, bsaville@bios.unc.edu <bsaville@bios.unc.edu> wrote:

> Full_Name: Ben Saville

*> Version: 2.1
**> OS: Windows XP
**> Submission from: (NULL) (152.2.94.145)
**>
**>
**> I'm using the gsummary function to calculate a sum of V1 (column one) from my
**> data 'mytest' by group (V2,or column 2). If V1 (the variable of interest) is
**> all the same value (in this case all 2's), I do not get back the correct
**> summation. If there is at least one difference in V1 (all 2's except for one
**> 1), it gives me correct values. So either I am doing something wrong or there
**> is a bug in the gsummary function.
**>
**> # Incorrect sums
**> mytest <- as.data.frame(matrix(c(2,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2))
**> mytest
**> gsummary(mytest,form=V1~1|V2, FUN=sum)[,1]
**>
**> # Correct sums
**> mytest <- as.data.frame(matrix(c(1,rep(2,8),1,1,2,2,2,3,3,3,3),ncol=2))
**> mytest
**> gsummary(mytest,form=V1~1|V2, FUN=sum)[,1]
**>
**> ______________________________________________
**> R-devel@r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-devel
**>
*

R-devel@r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-devel Received on Fri Apr 21 04:18:34 2006

*
This archive was generated by hypermail 2.1.8
: Thu 20 Apr 2006 - 20:18:01 GMT
*