From: Gavin Simpson <gavin.simpson_at_ucl.ac.uk>

Date: Thu, 24 Jun 2010 08:16:16 +0100

lin <- lm(y~x1+x2, data = dat)

pdat <- data.frame(x1 = rnorm(5, 2), x2 = rnorm(5)) predict(lin, pdat)

Date: Thu, 24 Jun 2010 08:16:16 +0100

On Wed, 2010-06-23 at 16:30 -0700, cc super wrote:

> What if the size of the newdata is different from the previous one

*> used to generate the regression model?
**>
**> Let's say
**>
**> pdat <- data.frame(x1 = rnorm(5, 2), x2 = rnorm(5))
**> predict(lin, pdat)
**>
**> It comes up with warning and the result is not correct.
*

No it doesn't (not on my system at least) and you give no indication what the warning/error is or why you think the result is incorrect.

set.seed(1234)

y <- rnorm(10, mean = 5)

dat <- data.frame(x1 = rnorm(10, mean = 2),

x2 = rnorm(10))

lin <- lm(y~x1+x2, data = dat)

pdat <- data.frame(x1 = rnorm(5, 2), x2 = rnorm(5)) predict(lin, pdat)

1 2 3 4 5 4.971021 6.482085 5.742130 4.652898 5.194980

The size of newdata shouldn't matter at all:

> pdat2 <- data.frame(x1 = rnorm(1, 2), x2 = rnorm(1))

> predict(lin, pdat2)

1

4.185984

> pdat3 <- data.frame(x1 = rnorm(1000, 2), x2 = rnorm(1000))

> head((p3 <- predict(lin, pdat3)))

1 2 3 4 5 6
4.076389 4.118338 4.054443 3.976399 4.055590 4.126018

> length(p3)

[1] 1000

So can you show the exact code you used plus the errors/warnings you report and why you think the results are incorrect?

For example, automating your by-hand computation a bit shows that predict() produces the correct result. Compare:

> ## The cbind(1, ....) bit is for the intercept...

> data.matrix(cbind(1, pdat)) %*% coef(lin)

[,1]

[1,] 4.178516 [2,] 4.101112 [3,] 4.081370 [4,] 4.146819 [5,] 4.1864054.178516 4.101112 4.081370 4.146819 4.186405

> predict(lin, pdat)

1 2 3 4 5

**HTH
**
G

*>
*

> Thanks!

*>
**>
**> 2010/6/23 Gavin Simpson <gavin.simpson_at_ucl.ac.uk>
**> On Tue, 2010-06-22 at 23:11 -0700, cc super wrote:
**> > Hi, everyone,
**> >
**> > Night. I have three questions about multiple linear
**> regression in R.
**> >
**> > Q1:
**> >
**> > y=rnorm(10,mean=5)
**> > x1=rnorm(10,mean=2)
**> > x2=rnorm(10)
**> > lin=lm(y~x1+x2)
**> > summary(lin)
**> >
**> > ## In the summary, 'Residual standard error: 1.017 on 7
**> degrees of freedom',
**> > 1.017 is the estimate of the constance variance?
**>
**>
**> Yes, it is sigma.
**>
**> Just a note, in order for the above code to yield the same
**> results as
**> you quote, you need a call to set.seed() to fix the pseudo
**> random number
**> generator.
**>
**> > Q2:
**> >
**> > beta0=lin$coefficients[1]
**> > beta1=lin$coefficients[2]
**> > beta2=lin$coefficients[3]
**> >
**> > y_hat=beta0+beta1*x1+beta2*x2
**> >
**> > ## Is there any built-in function in R to obtain y_hat
**> directly?
**>
**>
**> fitted(lin)
**>
**> Note that there are quite a few standard extractor functions
**> like fitted
**> available for modelling functions in R. coef() for example
**> should be
**> used to extract the coefficients, resid() will extract
**> residuals etc.
**>
**> > Q3:
**> >
**> > If I want to apply this regression result to another
**> dataset, that is, new
**> > x1 and x2. Is the built-in function in 2 still available?
**>
**>
**> It is called predict() (although if you called predict(lin)
**> above
**> instead of fitted(lin) it would have produced the same answer;
**> the
**> fitted values for the observations).
**>
**> One gotcha that catches people out is that in the new dataset,
**> the
**> variables (used in the model) must have the same names as the
**> data frame
**> used to fit it. So we could do:
**>
**> pdat <- data.frame(x1 = rnorm(10, 2), x2 = rnorm(10))
**> predict(lin, pdat)
**>
**> to get predictions at the new values of x1 an x2.
**>
**> > Thank you in advance!
**>
**> HTH
**>
**> G
**>
**> --
**> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
**> %~%~%~%
**> Dr. Gavin Simpson [t] +44 (0)20 7679 0522
**> ECRC, UCL Geography, [f] +44 (0)20 7679 0565
**> Pearson Building, [e]
**> gavin.simpsonATNOSPAMucl.ac.uk
**> Gower Street, London [w]
**> http://www.ucl.ac.uk/~ucfagls/
**> UK. WC1E 6BT. [w]
**> http://www.freshwaters.org.uk
**> %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~
**> %~%~%~%
**>
**>
*

-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.Received on Thu 24 Jun 2010 - 07:20:01 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Thu 24 Jun 2010 - 07:30:34 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*