# Re: [R] different regression coeffs with different starting point

From: Achim Zeileis <Achim.Zeileis_at_uibk.ac.at>
Date: Mon, 14 Mar 2011 19:59:57 +0100 (CET)

On Mon, 14 Mar 2011, Jen wrote:

> Hi Bill,
> Thanks for your response and I'm sorry -- that was a misleading example of
> what I was trying to show. This one should illustrate the point:
>
> require(AER)
> data_in = c(0,6,12,18,24,30,36,42,48,54,60,66,72,78)
> data_in2 = data_in^2
> data_in3 = data_in^3
> data_out =
> c(139487.00,133333.00,62500.00,58823.00,56338.00,27549.00,4244.00,600.00,112.00,95.00,48.00,31.00,15.00,14.99)
> ldata_out = log(data_out)
>
> m1 <- lm(ldata_out ~ data_in + data_in2+data_in3)
> cubic <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m1))

The scaling of your regressors is extremely bad. data_in3 is in the hundreds of thousands. This computationally challenging, to put it mildly. I would recommend to either scale data_in by some constant (here 10 or 100 is enough) or to use an orthogonal coding of the polynomial. Both can be achieved easily using the poly() function:

• Your model would be: ldata_out ~ poly(data_in, 3, raw = TRUE)
• With scaled regressors: ldata_out ~ poly(data_in/100, 3, raw = TRUE)
• With orthogonal coding: ldata_out ~ poly(data_in/100, 3)

All models conceptually lead to the same fitted values and the same maximal likelihood. Numerically, however, the first model only achieves suboptimal values depending on the starting values. The latter two models seem to yield more sensible results and be close to the actual maximum.

Note that the coefficients change of course due to the different codings of the regressors. This needs to be taken into account if you want to compare the slopes.

In summary, I would probably use

cubicz <- tobit(ldata_out ~ poly(data_in/100, 3, raw = TRUE),

left = log(15-0.01))

which is very robust to the choice of starting values.

hth,
Z

> print(cubic)
> fitted(cubic)
>
> n= length(data_in)
> m2 <- lm(ldata_out[1:(n-1)] ~ data_in[1:(n-1)] +
> data_in2[1:(n-1)]+data_in3[1:(n-1)])
> cubic2 <- tobit(ldata_out ~ data_in + data_in2 + data_in3, left=
> log(15-0.01),init = coef(m2))
> print(cubic2)
> fitted(cubic2)
>
> The models cubic and cubic2 seem to show that the intercept value in the
> model doesn't change from its starting value:
>
>> m1
>
> Call:
> lm(formula = ldata_out ~ data_in + data_in2 + data_in3)
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 1.156e+01 1.004e-01 -7.755e-03 6.464e-05
>
>> cubic
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
> 0.01), init = coef(m1))
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 11.5559540 0.0289083 -0.0040615 0.0000229
>
> Scale: 3.759
>
>> m2
>
> Call:
> lm(formula = ldata_out[1:(n - 1)] ~ data_in[1:(n - 1)] + data_in2[1:(n -
> 1)] + data_in3[1:(n - 1)])
>
> Coefficients:
> (Intercept) data_in[1:(n - 1)] data_in2[1:(n - 1)] data_in3[1:(n
> - 1)]
> 1.150e+01 1.151e-01 -8.381e-03
> 7.122e-05
>
>> cubic2
>
> Call:
> tobit(formula = ldata_out ~ data_in + data_in2 + data_in3, left = log(15 -
> 0.01), init = coef(m2))
>
> Coefficients:
> (Intercept) data_in data_in2 data_in3
> 1.150e+01 3.391e-02 -4.187e-03 2.383e-05
>
> Scale: 3.759
>
> Am I missing something here??
>
> --
> View this message in context: http://r.789695.n4.nabble.com/different-regression-coeffs-with-different-starting-point-tp3353536p3354276.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help