From: Mike Marchywka <marchywka_at_hotmail.com>

Date: Tue, 22 Mar 2011 21:55:55 -0400

*> Date: Tue, 22 Mar 2011 09:31:01 -0700
*

*> From: crosspide_at_hotmail.com
*

*> To: r-help_at_r-project.org
*

*> Subject: [R] lm ~ v1 + log(v1) + ... improve adj Rsq ¿any sense?
*

*>
*

> Dear all,

*>
*

*> I want to improve my adj - R sq. I 've chequed some established models and
*

*> they introduce two times the same variable, one transformed, and the other
*

*> not. It also improves my adj - R sq.
*

*>
*

*> But, isn't this bad for the collinearity? Do I interpret coefficients as
*

*> usual?
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 23 Mar 2011 - 02:09:02 GMT

Date: Tue, 22 Mar 2011 21:55:55 -0400

> Dear all,

I'm not sure how many replies you got or if your question was answered but just offhand let me see if I understand your concern. If your data is only over a limited range of v1 where you can Taylor expand to linear term only then sure it can be hard to tell a linear from log dependence of quantify a mixture of the two. If you try to find a and b to fit y=a*f(x) + b*g(x) that minimizes some error, you should be able to see the issues on paper. Presumaly log is not linear over a larger range and any error function, like SSE, would have "reasonbly " peaked minimum for some values of the two coefficients but you could do a sensitivty analysis to check- find the second derivatives of your error function or just perturb the coefficients a bit. I guess if there is some direction where the error does not change as a and b vary then you have the case you are worried about. I'm not sure what you consider to be "usual" but when I'm doing something like this, I usually have some physical interpretation mind. Most uninfomratively, you could interpret these coefficients as those which minimize your error given the data you have :) What you do from there depends on a lot of specifics. To tell if a given function seems to be appropriate for the data, it is always good to look at a plot of residuals. Note that ability to find a unique set of coefficients that minimizes a given error has nothing to do with independence of the two terms attached to the coefficients- indeed polynomial fits are a common example( log having a taylor series just constrains a lot of coefficient relationships LOL).

P-values and confidence intervals are another matter with post hoc
exploratory work but I'll let a statistician comment on that
as well as the meaning of the R output.

Usually the final decision on a putative model impovement comes
from your ability to infer something about the underlying system
although you may just want a simple empirical approximation
and be more worried about meeting a given error with a limited
number of computations etc etc.

Apparently you found on a retrospective literature search that
everyone else is using the log term.

Sometimes you see people ask questions like, " given that in 10 papers on
the subject 4 of them used the log term and these authors have historically
been right 50 percent of the time but the other 6 are right 40 percent of the
time, what are the chances that the log term should be included?" I will
also avoid commenting on this question except to say it illustrates
a number of ways people do approach these problems and what you consider
to be relevant to your situation.

*>
*

> Estimate Std. Error t value Pr(>|t|)

*> (Intercept) 1.73140 7.22477 0.240 0.81086
**> v1 -0.33886 0.20321 -1.668 0.09705 .
**> log(v1) 2.63194 3.74556 0.703 0.48311
**> v2 -0.01517 0.01089 -1.394 0.16507
**> log(v3) -0.45719 0.27656 -1.653 0.09995 .
**> factor1 -1.81517 0.62155 -2.920 0.00392 **
**> factor2 -1.87330 0.84375 -2.220 0.02759 *
**>
**> Analysis of Variance Table
**>
**> Response: height rise
**> Df Sum Sq Mean Sq F value Pr(>F)
**> v1 1 51.25 51.246 21.4128 6.842e-06 ***
**> log(v1) 1 13.62 13.617 5.6897 0.018048 *
**> v2 1 2.84 2.836 1.1850 0.277713
**> log(v3) 1 3.02 3.024 1.2638 0.262357
**> factor1 1 17.62 17.616 7.3608 0.007279 **
**> factor2 1 11.80 11.797 4.9294 0.027586 *
**> Residuals 190 454.71 2.393
**>
**> Thanks,
**> user_at_host.com
**>
**> --
**> View this message in context: http://r.789695.n4.nabble.com/lm-v1-log-v1-improve-adj-Rsq-any-sense-tp3396935p3396935.html
**> Sent from the R help mailing list archive at Nabble.com.
**>
**> ______________________________________________
**> R-help_at_r-project.org mailing list
**> https://stat.ethz.ch/mailman/listinfo/r-help
**> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
**> and provide commented, minimal, self-contained, reproducible code.
*

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 23 Mar 2011 - 02:09:02 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 23 Mar 2011 - 02:10:25 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*