Re: [R] Package Hmisc, functions summary.formula() and latex(), options pdig, pctdig, eps and prmsd

From: Frank E Harrell Jr <f.harrell_at_vanderbilt.edu>
Date: Fri, 25 Jul 2008 07:25:05 -0500

David Hajage wrote:
> Hello R users,
>
> I have several problems with the functions summary.formula and latex in
> package Hmisc. Here an example :
>
> ####
> library(Hmisc)
> sex <- factor(sample(c("m","f", "?"), 50, rep=TRUE))
> age <- rnorm(50, 50, 5)
> treatment <- factor(sample(c("Drug","Placebo"), 50, rep=TRUE))
> symp <- c('Headache','Stomach Ache','Hangnail',
> 'Muscle Ache','Depressed')
> symptom1 <- sample(symp, 50,TRUE)
> symptom2 <- sample(symp, 50,TRUE)
> symptom3 <- sample(symp, 50,TRUE)
> Symptoms <- mChoice(symptom1, symptom2, symptom3, label='Primary Symptoms')
>
> f <- summary(treatment ~ age + sex + Symptoms, method="reverse", test=TRUE)
> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F)
> print(f, digits = 5, pdig = 2, pctdig = 3, eps = 0.5, prmsd = T)
> latex(f, long = T, pdig = 2, pctdig = 3, eps = 0.5, prmsd = F, file = "")
> ###
>
> Here the problems :
> - The first print(f, ...) doesn't replace all p-value <0.5 by "P<0.5" :
>
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | |Drug
> |Placebo | Test |
> | |(N=31)
> |(N=19) |Statistic |
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> |age | 45.926/48.750/54.019|
> 47.344/50.728/53.696| F=0.9 d.f.=1,48 P<0.5 |
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> |sex : ? | 25.806% ( 8) | 26.316% (
> 5) | Chi-square=0.3 d.f.=2 P=0.86|
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | f | 38.710% (12) | 31.579% (
> 6) | |
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | m | 35.484% (11) | 42.105% (
> 8) | |
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> |Primary Symptoms : Depressed| 41.935% (13) | 63.158%
> (12) | Chi-square=2.12 d.f.=1 P<0.5|
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | Hangnail | 48.387% (15) | 42.105% (
> 8) |Chi-square=0.19 d.f.=1 P=0.67|
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | Stomach Ache | 45.161% (14) | 68.421%
> (13) | Chi-square=2.57 d.f.=1 P<0.5|
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | Muscle Ache | 54.839% (17) | 26.316% (
> 5) | Chi-square=3.89 d.f.=1 P<0.5|
> +----------------------------+-------------------------+-------------------------+-----------------------------+
> | Headache | 51.613% (16) | 36.842% (
> 7) | Chi-square=1.03 d.f.=1 P<0.5|
> +----------------------------+-------------------------+-------------------------+-----------------------------+

I did not see an instance of P < 0.5 that was not replaced by "P<0.5". This is an unusual cutoff, and you are printing many more digits of precision than offered by the data.

>
> - The second print(f, ..., prmsd = T) has the same problem for p-value.
> There is also 4 decimals in the first line instead of 3 :

Try modifying the digits argument.

>
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | |Drug
> |Placebo |

> Test |
> | |(N=31)
> |(N=19)
> |Statistic |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> |age |45.9263/48.7501/54.0194 49.1817+/-
> 5.2751|47.3436/50.7276/53.6963 51.1313+/- 5.5868| F=0.9 d.f.=1,48
> P<0.5 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> |sex : ? | 25.806% ( 8)
> | 26.316% ( 5) | Chi-square=0.3 d.f.=2
> P=0.86 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | f | 38.710% (12)
> | 31.579% ( 6)
> | |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | m | 35.484% (11)
> | 42.105% ( 8)
> | |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> |Primary Symptoms : Depressed| 41.935% (13)
> | 63.158% (12) | Chi-square=2.12 d.f.=1
> P<0.5 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | Hangnail | 48.387% (15)
> | 42.105% ( 8) | Chi-square=0.19 d.f.=1
> P=0.67|
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | Stomach Ache | 45.161% (14)
> | 68.421% (13) | Chi-square=2.57 d.f.=1
> P<0.5 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | Muscle Ache | 54.839% (17)
> | 26.316% ( 5) | Chi-square=3.89 d.f.=1
> P<0.5 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
> | Headache | 51.613% (16)
> | 36.842% ( 7) | Chi-square=1.03 d.f.=1
> P<0.5 |
> +----------------------------+------------------------------------------+------------------------------------------+------------------------------------+
>
> - In the latex(f, ...), only the first-line p-value is replaced by "< 0.5",
> and the quantiles for age have too much decimal :

This is a bug. An override source is below, to use until the next release.

>
> % latex.default(cstats, title = title, caption = caption, rowlabel =
> rowlabel, col.just = col.just, numeric.dollar = FALSE, insert.bottom =
> legend, rowname = lab, dcolumn = dcolumn, extracolheads =
> extracolheads, extracolsize = Nsize, ...)
> %
> \begin{table}[!tbp]
> \caption{Descriptive Statistics by treatment\label{f}}
> \begin{center}
> \begin{tabular}{lccc}\hline\hline
> \multicolumn{1}{l}{}&
> \multicolumn{1}{c}{Drug}&
> \multicolumn{1}{c}{Placebo}&
> \multicolumn{1}{c}{Test Statistic}
> \\ &\multicolumn{1}{c}{{\scriptsize
> $N=31$}}&\multicolumn{1}{c}{{\scriptsize $N=19$}}&\\ \hline
> age&{\scriptsize 45.92627049~}{48.75014585 }{\scriptsize 54.01942542}
> &{\scriptsize 47.34358486~}{50.72757132 }{\scriptsize 53.69634575} &$
> F_{1,48}=0.9 ,~ P<0.5 ^{1} $\\
> sex&&&$ \chi^{2}_{2}=0.3 ,~ P=0.859 ^{2} $\\
> ~~~~?&25.806\%~{\scriptsize~(~8)}&26.316\%~{\scriptsize~(~5)}&\\
> ~~~~f&38.710\%~{\scriptsize~(12)}&31.579\%~{\scriptsize~(~6)}&\\
> ~~~~m&35.484\%~{\scriptsize~(11)}&42.105\%~{\scriptsize~(~8)}&\\
> Primary~Symptoms&&&\\
> ~~~~Depressed&41.935\%~{\scriptsize~(13)}&63.158\%~{\scriptsize~(12)}&$
> \chi^{2}_{1}=2.12 ,~ P=0.145 ^{2} $\\
> ~~~~Hangnail&48.387\%~{\scriptsize~(15)}&42.105\%~{\scriptsize~(~8)}&$
> \chi^{2}_{1}=0.19 ,~ P=0.665 ^{2} $\\
> ~~~~Stomach~Ache&45.161\%~{\scriptsize~(14)}&68.421\%~{\scriptsize~(13)}&$
> \chi^{2}_{1}=2.57 ,~ P=0.109 ^{2} $\\
> ~~~~Muscle~Ache&54.839\%~{\scriptsize~(17)}&26.316\%~{\scriptsize~(~5)}&$
> \chi^{2}_{1}=3.89 ,~ P=0.049 ^{2} $\\
> ~~~~Headache&51.613\%~{\scriptsize~(16)}&36.842\%~{\scriptsize~(~7)}&$
> \chi^{2}_{1}=1.03 ,~ P=0.309 ^{2} $\\
> \hline
> \end{tabular}
>
> \end{center}
>
> \noindent {\scriptsize $a$\ }{$b$\ }{\scriptsize $c$\ } represent the lower
> quartile $a$, the median $b$, and the upper quartile $c$\ for continuous
> variables.\\
> Numbers after percents are frequencies.\\
> \indent Tests used: $^{1}$Wilcoxon test; $^{2}$Pearson test
> \end{table}
>
> I would like to understand :
> - why does option 'pctdig' affect both percentages and quantiles in the
> 'print' command (pctdig : 'number of digits to the right of the decimal
> place for *printing percentages*. The default is zero, so percents will be
> rounded to the nearest percent');
> - how can I change the number of decimal for continuous variables in the
> 'latex' command ?

I hope to address these 2 questions soon.

Frank

> - why doesn't option 'eps' affect all p-value ?
>
> I've just discover these wonderfull functions. Thank you to Frank Harrell
> for this package.
>
> DH
>

latex.summary.formula.reverse <-

   function(object, title=first.word(deparse(substitute(object))),

            digits, prn = any(n!=N), pctdig=0,
            npct=c('numerator','both','denominator','none'),
            npct.size='scriptsize', Nsize='scriptsize',
            exclude1=TRUE,  vnames=c("labels","names"), prUnits=TRUE,
            middle.bold=FALSE, outer.size="scriptsize",
            caption, rowlabel="",
            insert.bottom=TRUE, dcolumn=FALSE, formatArgs=NULL, round=NULL,
            prtest=c('P','stat','df','name'), prmsd=FALSE, msdsize=NULL,
            long=dotchart, pdig=3, eps=.001, auxCol=NULL, 
dotchart=FALSE, ...)
{
   x      <- object
   npct   <- match.arg(npct)
   vnames <- match.arg(vnames)
   if(is.logical(prtest) && !prtest)
     prtest <- 'none'

   stats  <- x$stats
   nv     <- length(stats)
   cstats <- lab <- character(0)
   nn     <- integer(0)
   type   <- x$type
   n      <- x$n
   N      <- x$N
   nams   <- names(stats)
   labels <- x$labels
   Units  <- x$units
   nw     <- if(lg <- length(x$group.freq)) lg
             else 1  #23Nov98

   gnames <- names(x$group.freq)
   test <- x$testresults
   if(!length(test))
     prtest <- 'none'

   gt1.test <-

     if(all(prtest=='none'))
       FALSE
     else
       length(unique(sapply(test,function(a)a$testname))) > 1

   if(!missing(digits)) {   #.Options$digits <- digits 6Aug00
     oldopt <- options(digits=digits)
     on.exit(options(oldopt))

   }

   if(missing(caption))

     caption <- paste("Descriptive Statistics",
                      if(length(x$group.label))
                        paste(" by",x$group.label)
                      else
                        paste("  $(N=",x$N,")$",sep=""), sep="")

   bld <- if(middle.bold) '\\bf '
          else ''

   cstats <- NULL
   testUsed <- auxc <- character(0)

   for(i in 1:nv) {

     if(length(auxCol))
       auxc <- c(auxc, auxCol[[1]][i])

     nn <- c(nn, n[i])   ## 12aug02
     nam <- if(vnames=="names") nams[i]
            else labels[i]

     if(prUnits && nchar(Units[i]) > 0)
       nam <- paste(nam, '~\\hfill\\tiny{',translate(Units[i],'*',' 
'),'}',sep='')
     tr  <- if(length(test) && all(prtest!='none')) test[[nams[i]]]
            else NULL

     if(length(test) && all(prtest!='none'))
       testUsed <- unique(c(testUsed, tr$testname))

     if(type[i]==1 || type[i]==3) {
       cs <- formatCats(stats[[i]], nam, tr, type[i],
                        if(length(x$group.freq)) x$group.freq else x$n[i],
                        npct, pctdig, exclude1, long, prtest,
                        latex=TRUE, testUsed=testUsed,
                        npct.size=npct.size,
                        pdig=pdig, eps=eps,
                        footnoteTest=gt1.test, dotchart=dotchart)
       nn <- c(nn, rep(NA, nrow(cs)-1))
     } else cs <- formatCons(stats[[i]], nam, tr, x$group.freq, prmsd,
                             prtest=prtest, formatArgs=formatArgs, 
round=round,
                             latex=TRUE, testUsed=testUsed,
                             middle.bold=middle.bold,
                             outer.size=outer.size, msdsize=msdsize,
                             pdig=pdig, eps=eps, footnoteTest=gt1.test)

     cstats <- rbind(cstats, cs)
     if(length(auxc) && nrow(cstats) > 1)
       auxc <- c(auxc, rep(NA, nrow(cs)-1))
   }

   lab <- dimnames(cstats)[[1]]
   gl <- names(x$group.freq)
   ##gl <- if(length(gl)) paste(gl, " $(N=",x$group.freq,")$",sep="") else " "

   ## Thanks: Eran Bellin <ebellin_at_montefiore.org> 3Aug01    if(!length(gl))
     gl <- " "

   lab <- sedit(lab,c(" ","&"),c("~","\\&"))  #was format(lab) 21Jan99
   lab <- latexTranslate(lab, greek=.R.)
   gl  <- latexTranslate(gl, greek=.R.)

   ## if(any(gl != " ")) gl <- paste(gl, " $(N=",x$group.freq,")$",sep="") # 3Aug01

   ## Added any( ) 26Mar02 21jan03
   extracolheads <-

     if(any(gl != " "))
       c(if(prn)'', paste('$N=',x$group.freq,'$',sep=''))
     else NULL # 21jan03

   if(length(test) && !all(prtest=='none')) {
     gl <- c(gl,
             if(length(prtest)==1 && prtest!='stat')
               if(prtest=='P') 'P-value'
               else prtest
             else 'Test Statistic')

     if(length(extracolheads)) extracolheads <- c(extracolheads,'') # 
21jan03

   }

   dimnames(cstats) <- list(NULL,gl)
   ## was dimnames(cstats) <- list(lab, gl) 12aug02    cstats <- data.frame(cstats, check.names=FALSE)

   ## Added row.names=lab below 10jul02 - S+ was dropping dimnames[[1]]    ##attr(cstats,'row.names') <- lab 12aug02    col.just <- rep("c",length(gl))
   if(dcolumn && all(prtest!='none') &&

      gl[length(gl)] %in% c('P-value','Test Statistic'))
     col.just[length(col.just)] <- '.'

   if(prn) {
     cstats <- data.frame(N=nn, cstats, check.names=FALSE)
     col.just <- c("r",col.just)

   }

   if(!insert.bottom)
     legend <- NULL
   else {

     legend <- paste(if(any(type==2)) {
                       paste("\\noindent {\\",outer.size," $a$\\ 
}{",bld,"$b$\\ }{\\",
                             outer.size," $c$\\ } represent the lower 
quartile $a$, the median $b$, and the upper quartile $c$\\ for continuous variables.",
                             if(prmsd) '~~$x\\pm s$ represents 
$\\bar{X}\\pm 1$ SD.'
                             else '',
                             '\\\\', sep="")
                     } else NULL,
                     if(prn) '$N$\\ is the number of non--missing 
values.\\\\',
                     if(any(type==1) && npct=='numerator')
                       'Numbers after percents are frequencies.\\\\',
                     sep="\n")
     legend <- NULL
     if(any(type==2)) {
       legend <- paste("\\noindent {\\", outer.size, " $a$\\ }{", bld,
                       "$b$\\ }{\\", outer.size,
                       " $c$\\ } represent the lower quartile $a$, the 
median $b$, and the upper quartile $c$\\ for continuous variables.",
                       if(prmsd) '~~$x\\pm s$ represents $\\bar{X}\\pm 
1$ SD.'
                       else '',
                       '\\\\\n', sep="")
     }

     if(prn) {
       legend <- paste(legend,
                       '$N$\\ is the number of non--missing values.\\\\\n',
                       sep='')
     }

     if(any(type==1) && npct=='numerator') {
       legend <- paste(legend,
                       'Numbers after percents are frequencies.\\\\\n',
                       sep='')
     }

     if(length(testUsed))
       legend <-paste(legend,
                      if(length(testUsed)==1)'\\noindent Test used:'
                      else '\\indent Tests used:',
                      if(length(testUsed)==1) paste(testUsed,'test')
                      else
                        paste(paste('$^{',1:length(testUsed),'}$',testUsed,
                                    ' test',sep=''),collapse='; '))

     ## added rowname=lab 12aug02  added '\n\n' 4mar03 for ctable=T
   }

   if(length(auxc)) {

     if(length(auxc) != nrow(cstats))
       stop(paste('length of auxCol (',length(auxCol[[1]]),
                  ') is not equal to number or variables in table (',
                  nv,').', sep=''))
     auxcc <- format(auxc)
     auxcc[is.na(auxc)] <- ''
     cstats <- cbind(auxcc, cstats)
     nax <- names(auxCol)
     heads <- get2rowHeads(nax)
     names(cstats)[1] <- heads[[1]]
     if(length(col.just)) col.just <- c('r', col.just)
     if(length(extracolheads)) extracolheads <- c(heads[2], extracolheads)
   }
   resp <- latex.default(cstats, title=title, caption=caption, rowlabel=rowlabel,
                         col.just=col.just, numeric.dollar=FALSE,
                         insert.bottom=legend,  rowname=lab, 
dcolumn=dcolumn,
                         extracolheads=extracolheads, extracolsize=Nsize,
                         ...)

   if(dotchart)
     resp$style <- unique(c(resp$style, 'calc', 'epic', 'color'))

   resp
}

-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Fri 25 Jul 2008 - 12:28:29 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 25 Jul 2008 - 13:32:29 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive