Re: [R] Calculating Summaries for each level of a Categorical variable

From: Christos Argyropoulos <argchris_at_hotmail.com>
Date: Sun, 27 Jun 2010 15:36:56 +0300

Hi Raoul,
I presume you need these summaries for a table of descriptive statistics for a thesis/report/paper ("Table 1" as known informally by medical researchers). If this is the case, then specify method="reverse" to summary.formula. In the following small example, I create 4 groups of patients and specify 2 characteristics per patient (age and gender) and use summary.formula to summarize characteristics by group. Running the stats on patient characteristics by group is optional but is included for completeness. If you are looking for something like this I strongly advise you spent some time fiddling around with summary.formula and read:

Harrell FE (2004): Statistical tables and plots using S and LaTeX (available from http://biostat.mc.vanderbilt.edu/twiki/pub/Main/StatReport/summary.pdf

The 2-3 hours you are going to need to familiarize yourself with this package are really worth spending for (especially if you are going to use call LaTEX on the output). If you are a Windows user, copy and paste the output of the print function into Excel or OpenOffice and use the Text to Columns facilities of the two programs to format the output into a table that can be used inside a manuscript.

Christos

## R-code follows

library(Hmisc)
## One baseline factor (e.g. patient group)
grp<-round(runif(20,1,4))
grp<-factor(grp,labels=paste("Group",1:4))

## Another factor (e.g. sex)

sex<-round(runif(20,1,2))
sex<-factor(sex,labels=c("Male","Female"))

## A continuous variable (e.g. age)

age<-rlnorm(20,4,.1)

## A data frame

data<-data.frame(age=age,grp=grp,sex=sex)

## Table 1

sm<-summary(grp~sex+age,method="reverse",overall=T,test=T) print(sm,dig=2,exclude1=F)

Descriptive Statistics by grp

```+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|          |Group 1           |Group 2           |Group 3           |Group 4           |Combined          |  Test                      |
|          |(N=3)             |(N=6)             |(N=8)             |(N=3)             |(N=20)            |Statistic                   |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
```
|sex : Male|          67% ( 2)|          67% ( 4)|          25% ( 2)|          67% ( 2)|          50% (10)|Chi-square=3.3 d.f.=3 P=0.34|
```+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
|    Female|          33% ( 1)|          33% ( 2)|          75% ( 6)|          33% ( 1)|          50% (10)|                            |
+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+
```
|age       |          60/62/65|          51/55/60|          46/51/57|          46/48/52|          49/54/60|   F=2.9 d.f.=3,16 P=0.068  |
```+----------+------------------+------------------+------------------+------------------+------------------+----------------------------+

```

```> Date: Sat, 26 Jun 2010 21:48:05 -0700
> From: raoul.t.dsouza_at_gmail.com
> To: r-help_at_r-project.org
> Subject: Re: [R] Calculating Summaries for each level of a Categorical	variable
>
>
> Hi Christos,
>
> Thanks for this. I had a look at Summary.Forumla in the Hmisc package and it
> is extremely complicated for me. Still trying to decipher how I could use
> it.
>
> Regards,
> Raoul
> --
> View this message in context: http://r.789695.n4.nabble.com/Calculating-Summaries-for-each-level-of-a-Categorical-variable-tp2269349p2269816.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help