From: 李俊杰 <klijunjie_at_gmail.com>

Date: Tue, 22 May 2007 12:08:45 +0800

Date: Tue, 22 May 2007 12:08:45 +0800

Thank you for attention first.

If you are interested in the performance of your strategies, e.g. maximizing
adjusted R^2 always with intercept. you can run the code I put in the
attachment.

It will show that maximizing adjusted R^2 NOT always with intercept beats
maximizing adjusted R^2 always with intercept.

Junjie

2007/5/22, Paul Lynch <plynchnlm_at_gmail.com>:

*>
*

> Junjie,

*> First, a disclaimer: I am not a statistician, and have only taken
**> one statistics class, but I just took it this Spring, so the concepts
**> of linear regression are relatively fresh in my head and hopefully I
**> will not be too inaccurate.
**> According to my statistics textbook, when selecting variables for
**> a model, the intercept term is always present. The "variables" under
**> consideration do not include the constant "1" that multiplies the
**> intercept term. I don't think it makes sense to compare models with
**> and without an intercept term. (Also, I don't know what the point of
**> using a model without an intercept term would be, but that is probably
**> just my ignorance.)
**> Similarly, the formula you were using for R**2 seems to only be
**> useful in the context of a standard linear regression (i.e., one that
**> includes an intercept term). As your example shows, it is easy to
**> construct a "fit" (e.g. y = 10,000,000*x) so that SSR > SST if one is
**> not deriving the fit from the regular linear regression process.
**> --Paul
**>
**> On 5/19/07, 李俊杰 <klijunjie_at_gmail.com> wrote:
**> > I know that "-1" indicates to remove the intercept term. But my question
**> is
**> > why intercept term CAN NOT be treated as a variable term as we place a
**> > column consited of 1 in the predictor matrix.
**> >
**> > If I stick to make a comparison between a model with intercept and one
**> > without intercept on adjusted r2 term, now I think the strategy is
**> always to
**> > use another definition of r-square or adjusted r-square, in which
**> > r-square=sum(( y.hat)^2)/sum((y)^2).
**> >
**> > Am I in the right way?
**> >
**> > Thanks
**> >
**> > Li Junjie
**> >
**> >
**> > 2007/5/19, Paul Lynch <plynchnlm_at_gmail.com>:
**> > > In case you weren't aware, the meaning of the "-1" in y ~ x - 1 is to
**> > > remove the intercept term that would otherwise be implied.
**> > > --Paul
**> > >
**> > > On 5/17/07, 李俊杰 <klijunjie_at_gmail.com> wrote:
**> > > > Hi, everybody,
**> > > >
**> > > > 3 questions about R-square:
**> > > > ---------(1)----------- Does R2 always increase as variables are
**> added?
**> > > > ---------(2)----------- Does R2 always greater than 1?
**> > > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
**> > > > calculated? It is different from (r.square=sum((y.hat-mean
**> > > > (y))^2)/sum((y-mean(y))^2))
**> > > >
**> > > > I will illustrate these problems by the following codes:
**> > > > ---------(1)----------- R2 doesn't always increase as
**> > variables are added
**> > > >
**> > > > > x=matrix(rnorm(20),ncol=2)
**> > > > > y=rnorm(10)
**> > > > >
**> > > > > lm=lm(y~1)
**> > > > > y.hat=rep(1*lm$coefficients,length(y))
**> > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
**> > > > [1] 2.646815e-33
**> > > > >
**> > > > > lm=lm(y~x-1)
**> > > > > y.hat=x%*%lm$coefficients
**> > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
**> > > > [1] 0.4443356
**> > > > >
**> > > > > ################ This is the biggest model, but its R2 is not the
**> > biggest,
**> > > > why?
**> > > > > lm=lm(y~x)
**> > > > > y.hat=cbind(rep(1,length(y)),x)%*%lm$coefficients
**> > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
**> > > > [1] 0.2704789
**> > > >
**> > > >
**> > > > ---------(2)----------- R2 can greater than 1
**> > > >
**> > > > > x=rnorm(10)
**> > > > > y=runif(10)
**> > > > > lm=lm(y~x-1)
**> > > > > y.hat=x*lm$coefficients
**> > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
**> > > > [1] 3.513865
**> > > >
**> > > >
**> > > > ---------(3)----------- How is R2 in summary(lm(y~x-1))$r.squared
**> > > > calculated? It is different from (r.square=sum((y.hat-mean
**> > > > (y))^2)/sum((y-mean(y))^2))
**> > > > > x=matrix(rnorm(20),ncol=2)
**> > > > > xx=cbind(rep(1,10),x)
**> > > > > y=x%*%c(1,2)+rnorm(10)
**> > > > > ### r2 calculated by lm(y~x)
**> > > > > lm=lm(y~x)
**> > > > > summary(lm)$r.squared
**> > > > [1] 0.9231062
**> > > > > ### r2 calculated by lm(y~xx-1)
**> > > > > lm=lm(y~xx-1)
**> > > > > summary(lm)$r.squared
**> > > > [1] 0.9365253
**> > > > > ### r2 calculated by me
**> > > > > y.hat=xx%*%lm$coefficients
**> > > > > (r.square=sum((y.hat-mean(y))^2)/sum((y-mean(y))^2))
**> > > > [1] 0.9231062
**> > > >
**> > > >
**> > > > Thanks a lot for any cue:)
**> > > >
**> > > >
**> > > >
**> > > >
**> > > > --
**> > > > Junjie Li, klijunjie_at_gmail.com
**> > > > Undergranduate in DEP of Tsinghua University,
**> > > >
**> > > > [[alternative HTML version deleted]]
**> > > >
**> > > > ______________________________________________
**> > > > R-help_at_stat.math.ethz.ch mailing list
**> > > > https://stat.ethz.ch/mailman/listinfo/r-help
**> > > > PLEASE do read the posting guide
**> > http://www.R-project.org/posting-guide.html
**> > > > and provide commented, minimal, self-contained, reproducible code.
**> > > >
**> > >
**> > >
**> > > --
**> > > Paul Lynch
**> > > Aquilent, Inc.
**> > > National Library of Medicine (Contractor)
**> > >
**> >
**> >
**> >
**> > --
**> >
**> > Junjie Li, klijunjie_at_gmail.com
**> > Undergranduate in DEP of Tsinghua University,
**>
**>
**> --
**> Paul Lynch
**> Aquilent, Inc.
**> National Library of Medicine (Contractor)
**>
*

-- Junjie Li, klijunjie_at_gmail.com Undergranduate in DEP of Tsinghua University,Received on Tue 22 May 2007 - 04:14:39 GMT______________________________________________ R-help_at_stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Tue 22 May 2007 - 08:31:41 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*