From: Peter Dalgaard <p.dalgaard_at_biostat.ku.dk>

Date: Tue 05 Jul 2005 - 20:37:11 EST

Date: Tue 05 Jul 2005 - 20:37:11 EST

"Bashir Saghir (Aztek Global)" <Saghir.Bashir@ucb-group.com> writes:

> I am getting a difference in results when running some analysis using by and

*> tapply compare to using a for loop. I've tried searching the web but had no
**> luck with the keywords I used.
**>
**> I've attached a simple example below to illustrates my problem. I get a
**> difference in the mean of yvar, diff and the p-value using tapply & by
**> compared to a for loop. I cannot see what I am doing wrong. Can anyone help?
**>
**> > # Simulate some data - I'll do 2 simulations...
**> >
**> > xvar = rnorm(40, 20, 5)
**> > yvar = rnorm(40, 22, 2)
**> > num = factor(rep(1:2, each=20))
**> > sdat = data.frame(cbind(num, xvar, yvar))
**> >
**> > # Define a function to do a simple t test and return some values...
**> >
**> > kindtest = function(varx, vary){
**> + res = t.test(varx, vary)
**> + x.mn = res$estimate[1]
**> + y.mn = res$estimate[2]
**> + diff = y.mn-x.mn
**> + pval = res$p.value
**> + cat("Mean xvar =", x.mn, " Mean yvar =", y.mn)
**> + cat(" diff =", diff, " p-value=", pval, "\n\n")
**> + list(x.mn=x.mn, y.mn=y.mn, diff=diff, pval=pval)
**> + }
**>
**> ## Results from by and tapply
**>
**> > attach(sdat)
**> > bres = by(xvar, num, kindtest, yvar)
**> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
**> 0.06222805
**> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
**> 0.05245329
**>
**> > tres = tapply(xvar, num, kindtest, yvar)
**> Mean xvar = 19.8904 Mean yvar = 21.97729 diff = 2.086891 p-value=
**> 0.06222805
**> Mean xvar = 19.88329 Mean yvar = 21.97729 diff = 2.093996 p-value=
**> 0.05245329
**>
**> > detach(sdat,1)
**>
**> ## Results from for
**>
**> > for(i in 1:2) {
**> + subdat= subset(sdat, num==i)
**> + kindtest(subdat$xvar, subdat$yvar)
**> + }
**> Mean xvar = 19.8904 Mean yvar = 21.98615 diff = 2.095746 p-value=
**> 0.07319223
**> Mean xvar = 19.88329 Mean yvar = 21.96843 diff = 2.085141 p-value=
**> 0.05850057
**>
*

The fact that the by/tapply approach is giving you the same Mean yvar for both groups should be a dead giveaway....

Stick print(varx) and print(vary) into kindtest, and you'll see the point. You are passing yvar *without* subsetting (and since the t.test isn't paired, it can hardly be expected to complain that x and y differ in length...).

This is probably closer to the mark:

by(sdat, num, with, kindtest(xvar, yvar))

-- O__ ---- Peter Dalgaard ุster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard@biostat.ku.dk) FAX: (+45) 35327907 ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.htmlReceived on Tue Jul 05 21:08:28 2005

*
This archive was generated by hypermail 2.1.8
: Fri 03 Mar 2006 - 03:33:13 EST
*