Re: [R] R vs. Excel (R-squared)

From: Cleber N. Borges <cborges_at_iqm.unicamp.br>
Date: Thu 26 Jan 2006 - 03:32:58 EST

I was quite interested in this thread (discussion), once that I am chemistry student and I work with Mixtures Designs that are models without intercept.

I thought quite attention the follow afirmation:

' Thus SST, the "corrected" total
sum of squares, should be used when you have a model with an intercept term but the uncorrected total sum of squares should be used when you do not have an intercept term. ' (Douglas Bates)

I have as reference a book called:

"Experiments with Mixtures: Designs, Models, and the Analysis of Mixture Data"
second edition

John A. Cornell
(Professor of Statistics in University Of Florida)

In this book, pg 42: item 2.7 - THE ANALYSIS OF VARIANCE TABLE, I have the model below:

y(x) = 11.7x1 + 9.4x2 + 16.4x3 + 19.0x1x2 + 11.4x1x3 - 9.6x2x3

with the follow ANOVA Table:

source of variation    D.F.    SS                    MS

Regression        p-1    SSR=\sum( y_{pred} - y_{mean} )^2    ssR/(p-1)

Residual        N-p    SSE=\sum( y_{exp} - y_{pred} )^2    ssE/(N-p)   

Total            N-1    SSE=\sum( y_{exp} - y_{mean} )^2


pred = predicted
exp = experimental

and in many others books.

I always see the ANOVA Table of Mixtures systems with SST, the "corrected" total
sum of squares ( N-1 degrees freedom ).

I would like to ask:

  1. What is approach ( point view ) more adequate ?
  2. Could someone indicate some reference about this subject ?

Thanks a lot.
Regards

Cleber N. Borges

############################ Dados
x1      x2      x3      y

1 0 0 11
1 0 0 12.4
0.5    0.5    0    15
0.5    0.5    0    14.8
0.5    0.5    0    16.1

0 1 0 8.8
0 1 0 10
0    0.5    0.5    10
0    0.5    0.5    9.7
0    0.5    0.5    11.8

0 0 1 16.8
0 0 1 16
0.5    0    0.5    17.7
0.5    0    0.5    16.4
0.5    0    0.5    16.6

############################## Model

d.lm <- lm( y ~ -1 + x1*x2*x3 - x1:x2:x3, data = Dados )

### Anova like in the book
d.aov <- aov( y ~ x1*x2*x3 - x1:x2:x3, data = Dados ) #### SSR (fitted Model) = 128.296

Douglas Bates wrote:

>On 1/24/06, Lance Westerhoff <lance@quantumbioinc.com> wrote:
>
>
>>Hi-
>>
>>On Jan 24, 2006, at 12:08 PM, Peter Dalgaard wrote:
>>
>>
>>
>>>Lance Westerhoff <lance@quantumbioinc.com> writes:
>>>
>>>
>>>
>>>>Hello All-
>>>>
>>>>I found an inconsistency between the R-squared reported in Excel vs.
>>>>that in R, and I am wondering which (if any) may be correct and if
>>>>this is a known issue. While it certainly wouldn't surprise me if
>>>>Excel is just flat out wrong, I just want to make sure since the R-
>>>>squared reported in R seems surprisingly high. Please let me know if
>>>>this is the wrong list. Thanks!
>>>>
>>>>
>>>Excel is flat out wrong. As the name implies, R-squared values cannot
>>>be less than zero (adjusted R-squared can, but I wouldn't think
>>>that is what Excel does).
>>>
>>>
>>I had thought the same thing, but then I came across the following
>>site which states: "Note that it is possible to get a negative R-
>>square for equations that do not contain a constant term. If R-square
>>is defined as the proportion of variance explained by the fit, and if
>>the fit is actually worse than just fitting a horizontal line, then R-
>>square is negative. In this case, R-square cannot be interpreted as
>>the square of a correlation." Since
>>
>>R^2 = 1 - (SSE/SST)
>>
>>I guess you can have SSE > SST which would result in a R^2 of less
>>then 1.0. However, it still seems very strange which made me wonder
>>what is going on in Excel needless to say!
>>
>>http://www.mathworks.com/access/helpdesk/help/toolbox/curvefit/
>>ch_fitt9.html
>>
>>
>
>This seems to be a case of using the wrong formula. R^2 should
>measure the amount of variation for which the given model accounts
>relative to the amount of variation for which the *appropriate* null
>model does not account. If you have a constant or intercept term in a
>linear model then the null model for comparison is one with the
>intercept only. If you have a linear model without an intercept term
>then the appropriate null model for comparison is the model that
>predicts all the responses as zero. Thus SST, the "corrected" total
>sum of squares, should be used when you have a model with an intercept
>term but the uncorrected total sum of squares should be used when you
>do not have an intercept term.
>
>It is disappointing to see the MathWorks propagating such an
>elementary misconception.
>
>______________________________________________
>R-help@stat.math.ethz.ch mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
>
>.
>
>
>



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Thu Jan 26 03:53:52 2006

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:42:10 EST