From: Prof Brian Ripley <ripley_at_stats.ox.ac.uk>

Date: Tue, 19 Jun 2007 17:53:33 +0100 (BST)

Date: Tue, 19 Jun 2007 17:53:33 +0100 (BST)

tapply gives an array: you want to use as.vector() on its result.

On Tue, 19 Jun 2007, John Phillips wrote:

*> Hi,
**>
*

> I am using R to fit statistical models to data were the observations are

*> means of the original data. R is used to calculate the mean before fitting
**> the model. My problem is: When R calculates the means using tapply, the
**> class of the means differs from the class of the original data, which gives
**> me trouble when I want to use the original data to calculate model
**> predictions. Here is a simple example that demonstrates the problem:
**>
**>> data.in<-read.table('example.dat',header=TRUE)
**>>
**>> #Here are the data:
**>> data.in
**> location x y
**> 1 A 17.2 28.46
**> 2 A 91.7 143.33
**> 3 A 93.6 148.05
**> 4 B 95.8 150.28
**> 5 B 54.9 89.49
**> 6 B 51.1 82.51
**> 7 C 53.9 88.46
**> 8 C 40.3 63.62
**> 9 C 38.5 64.46
**> >
**>> attach(data.in)
**>>
**>> #Calculate means by variable "location":
**>> data.mn<-data.frame(xm = tapply(x,location,mean), ym =
**> tapply(y,location,mean))
**>> detach(data.in)
**>>
**>> #Here are the means:
**>> data.mn
**> xm ym
**> A 67.50000 106.6133
**> B 67.26667 107.4267
**> C 44.23333 72.1800
**>>
**>> #Fit the model:
**>> mod1<-lm(ym ~ xm, data.mn)
**>>
**>> mod1
**>
**> Call:
**> lm(formula = ym ~ xm, data = data.mn)
**>
**> Coefficients:
**> (Intercept) xm
**> 5.633 1.505
**>
**>> #R will make "predictions" using the data.mn data frame:
**>> predict(mod1,newdata = data.mn)
**> A B C
**> 107.19260 106.84153 72.18587
**>>
**>> #But, even if new variables are created in the original data
**>> #with names that match those names used in the regression:
**> > data.in$xm<-data.in$x
**>> data.in$ym<-data.in$y
**>> data.in
**> location x y xm ym
**> 1 A 17.2 28.46 17.2 28.46
**> 2 A 91.7 143.33 91.7 143.33
**> 3 A 93.6 148.05 93.6 148.05
**> 4 B 95.8 150.28 95.8 150.28
**> 5 B 54.9 89.49 54.9 89.49
**> 6 B 51.1 82.51 51.1 82.51
**> 7 C 53.9 88.46 53.9 88.46
**> 8 C 40.3 63.62 40.3 63.62
**> 9 C 38.5 64.46 38.5 64.46
**>>
**>> #R will not use data.in to make predictions:
**>> predict(mod1,newdata = data.in)
**> Error: variable 'xm' was fitted with class "other" but class "numeric" was
**> supplied
**>>
**>> data.in$xm
**> [1] 17.2 91.7 93.6 95.8 54.9 51.1 53.9 40.3 38.5
**>> data.mn$xm
**> A B C
**> 67.50000 67.26667 44.23333
**>>
**>
**> Is there a way to make these variables have the same class? Or, is there
**> something other than "tapply" that will work better for this?
**>
**> Thanks!
**>
**> [[alternative HTML version deleted]]
**>
**> ______________________________________________
**> R-help_at_stat.math.ethz.ch mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
**>
*

-- Brian D. Ripley, ripley_at_stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-help_at_stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Tue 19 Jun 2007 - 17:11:34 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 19 Jun 2007 - 17:32:09 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*