Re: [R] by (tapply) and for loop differences

From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>
Date: Tue 05 Jul 2005 - 20:37:11 EST

"Bashir Saghir (Aztek Global)" <Saghir.Bashir@ucb-group.com> writes:

> I am getting a difference in results when running some analysis using by and
> tapply compare to using a for loop. I've tried searching the web but had no
> luck with the keywords I used.
>
> I've attached a simple example below to illustrates my problem. I get a
> difference in the mean of yvar, diff and the p-value using tapply & by
> compared to a for loop. I cannot see what I am doing wrong. Can anyone help?
>
> > # Simulate some data - I'll do 2 simulations...
> >
> > xvar = rnorm(40, 20, 5)
> > yvar = rnorm(40, 22, 2)
> > num = factor(rep(1:2, each=20))
> > sdat = data.frame(cbind(num, xvar, yvar))
> >
> > # Define a function to do a simple t test and return some values...
> >
> > kindtest = function(varx, vary){
> + res = t.test(varx, vary)
> + x.mn = res$estimate[1]
> + y.mn = res$estimate[2]
> + diff = y.mn-x.mn
> + pval = res$p.value
> + cat("Mean xvar =", x.mn, " Mean yvar =", y.mn)
> + cat(" diff =", diff, " p-value=", pval, "\n\n")
> + list(x.mn=x.mn, y.mn=y.mn, diff=diff, pval=pval)
> + }
>
> ## Results from by and tapply
>
> > attach(sdat)
> > bres = by(xvar, num, kindtest, yvar)
> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
> 0.06222805
> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
> 0.05245329
>
> > tres = tapply(xvar, num, kindtest, yvar)
> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
> 0.06222805
> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
> 0.05245329
>
> > detach(sdat,1)
>
> ## Results from for
>
> > for(i in 1:2) {
> + subdat= subset(sdat, num==i)
> + kindtest(subdat$xvar, subdat$yvar)
> + }
> Mean xvar = 19.8904 Mean yvar = 21.98615 diff = 2.095746 p-value=
> 0.07319223
> Mean xvar = 19.88329 Mean yvar = 21.96843 diff = 2.085141 p-value=
> 0.05850057
>

The fact that the by/tapply approach is giving you the same Mean yvar for both groups should be a dead giveaway....

Stick print(varx) and print(vary) into kindtest, and you'll see the point. You are passing yvar *without* subsetting (and since the t.test isn't paired, it can hardly be expected to complain that x and y differ in length...).

This is probably closer to the mark:

  by(sdat, num, with, kindtest(xvar, yvar))

-- 
   O__  ---- Peter Dalgaard             ุster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics     PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark          Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk)                  FAX: (+45) 35327907

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jul 05 21:08:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:33:13 EST