Re: [R] the format of the result

From: Marc Schwartz (via MN) <mschwartz_at_mn.rr.com>
Date: Sat 02 Jul 2005 - 03:42:20 EST

On Fri, 2005-07-01 at 19:40 +0800, ronggui wrote:
> I write a function to get the frequency and prop of a variable.
>
> freq<-function(x,digits=3)
> {naa<-is.na(x)
> nas<-sum(naa)
> if (any(naa))
> x<-x[!naa]
> n<-length(x)
> ta<-table(x)
> prop<-prop.table(ta)*100
> res<-rbind(ta,prop)
> rownames(res)<-c("Freq","Prop")
> cat("Missing value(s) are",nas,".\n")
> cat("Valid case(s) are",n,".\n")
> cat("Total case(s) are",(n+nas),".\n\n")
> print(res,digits=(digits+2))
> cat("\n")
> }
>
> > freq(sample(letters[1:3],48,T),2)
> Missing value(s) are 0 .
> Valid case(s) are 48 .
> Total case(s) are 48 .
>
> a b c
> Freq 11.00 20.00 17.00
> Prop 22.92 41.67 35.42
>
> and i want the result to be like
> a b c
> Freq 11.00 20.00 17.00
> Prop 22.92% 41.67% 35.42%
>
> how should i change my function to get what i want?

Here is a modification of the function that I think should work. Note that part of the output formatting process has to take into account the a priori unknowns involving your 'digits' argument, the lengths of the dimnames resulting from the table and the lengths of the frequency counts in the table. Thus, a fair amount of the code is establishing the 'width' argument, which is then used in formatC() so that the output can be column aligned properly.

Note that by default, table() will exclude "NA", so you do not need to subset 'x' before using table().

Also, note that I change "Prop" to "Pct".

freq <- function(x, digits = 3)
{
  n <- length(x)
  missing <- sum(is.na(x))
  ta <- table(x)
  pct <- prop.table(ta) * 100

  width <- max(nchar(unlist(dimnames(ta))) + 1,

               nchar(ta) + digits + 1,
               5 + digits)
  
  Vals <- paste(formatC(unlist(dimnames(ta)), format = "s",
                        width = width),
                collapse = "  ")

  Freq <- paste(formatC(ta, format = "f", digits = digits,
                        width = width),
                collapse = "  ")

  Pct <- paste(formatC(pct, format = "f", digits = digits,
                       width = width),
               "%", sep = "", collapse = " ")

  cat("Missing value(s) are", missing, ".\n")
  cat("Valid case(s) are", n - missing,".\n")   cat("Total case(s) are", n, ".\n\n")
  cat("    ", Vals, "\n")
  cat("Freq", Freq, "\n")
  cat("Pct ", Pct, "\n")
  cat("\n")

}

Thus:

> freq(sample(letters[1:3], 48, TRUE), 2)
Missing value(s) are 0 .
Valid case(s) are 48 .
Total case(s) are 48 .

           a        b        c 
Freq   28.00     8.00    12.00 

Pct 58.33% 16.67% 25.00%

> freq(sample(c(letters[1:3], NA), 1000, TRUE), 2)
Missing value(s) are 257 .
Valid case(s) are 743 .
Total case(s) are 1000 .

           a b c
Freq 250.00 218.00 275.00
Pct 33.65% 29.34% 37.01%

> freq(iris$Species)

Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

          setosa   versicolor    virginica 
Freq      50.000       50.000       50.000 
Pct       33.333%      33.333%      33.333% 


> freq(iris$Species, 0)

Missing value(s) are 0 .
Valid case(s) are 150 .
Total case(s) are 150 .

          setosa   versicolor    virginica 
Freq          50           50           50 
Pct           33%          33%          33% 


HTH, Marc Schwartz



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Sat Jul 02 03:45:29 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:10 EST