Message-Id: <3.0.1.32.19980222121618.0069319c@mail.u-net.com> Date: Sun, 22 Feb 1998 12:16:18 +0000 To: r-help@stat.math.ethz.ch From: MM Peterson <magnus@balhaldie.u-net.com> Subject: R-beta: t.test in R RE t.test in R I objected a day or two ago to the behaviour of the one-sample t.test in R where it is easy to generate a "confidence interval" for the mean of the population which does not contain the sample mean, in the case where the null hypothesis is rejected. It now appears that the same behaviour is latent in the code for the two-sample version of this test. The relevant lines from the code for the t.test function are reproduced below, the two lines I find objectionable in each being the ones where a value is assigned to tstat. if (var.equal) { df <- nx + ny - 2 v <- ((nx - 1) * vx + (ny - 1) * vy)/df stderr <- sqrt(v * (1/nx + 1/ny)) tstat <- (mx - my - mu)/stderr } else { stderrx <- sqrt(vx/nx) stderry <- sqrt(vy/ny) stderr <- sqrt(stderrx^2 + stderry^2) df <- stderr^4/(stderrx^4/(nx - 1) + stderry^4/(ny - 1)) tstat <- (mx - my - mu)/stderr } I say this problem is LATENT in the code, because it is very rare indeed to apply the two-sample t-test with a proposed null-value for the difference of the means of the populations from which the samples came different from 0. Nevertheless if such a case were analysed, with a straightforwardd two-sided alternative, one would expect the confidence interval for the difference of the population means given as part of the output to be centred on the difference of the sample means observed. Instead the same anomalous behaviour is of course apparent as in the one-sample case as the following examples show. > x.sample <- scan() 1: 4 5 6 7 8 6: Read 5 items > y.sample <- x.sample > t.test(x.sample,y.sample,var.equal=TRUE,mu=50) Two Sample t-test data: x.sample and y.sample t = -50, df = 8, p-value = 0 alternative hypothesis: true difference in means is not equal to 50 95 percent confidence interval: -52.306 -47.694 sample estimates: mean of x mean of y 6 6 > t.test(x.sample,y.sample,mu=50) Welch Two Sample t-test data: x.sample and y.sample t = -50, df = 8, p-value = 0 alternative hypothesis: true difference in means is not equal to 50 95 percent confidence interval: -52.306 -47.694 sample estimates: mean of x mean of y 6 6 I guess the same simple remedy is available in these cases as for the one-sample test, until and if changes are made in version 0.61.2. Magnus Peterson -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.- r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html Send "info", "help", or "[un]subscribe" (in the "body", not the subject !) To: r-help-request@stat.math.ethz.ch _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._