# [R] R: Optimization

From: Simone Vincenzi <svincenz_at_nemo.unipr.it>
Date: Wed 06 Sep 2006 - 00:10:34 EST

Thanks for the help.
I thought about taking the logarithms of the expression but to cope with the problem of zero values both in yield and the values of the environmental variables another transformation is needed (I simply add one to the yield value and to the environmental variables values). I follow the suggestion kindly provided by Spencer Graves but I found the following problems, which depend probably on my understanding:
1) I don't understand why it is needed to test the no-constant model (which provides a better fitting of the model). Can I simply assume a no-constant model?
2) Here comes the big problem. I don't understand what it is done with the models fit.0 and fit.1. In both cases I have to leave out one variable from the linear model but I don't understand why in this way I would test the constraint that all the coefficients should sum to 1. And it is not possible to apply anova(fit0,fit.0) and anova(fit1,fit.1) because I have different response variables.

In conclusion, I don't actually know how to proceed and if in R this kind of analysis is possible.
Any help and any further explanation would be very appreciated. Below I report part of the data frame I'm using for the analysis. The environmental variables (Salinity, Hydrodynamism, Sediment, Oxygen, Chlorophyll and Bathymetry) values have been transformed using suitability function (and thus are bounded between 0 and 1) while Yield is in kg/m2.

Sedi Sal Bathy Chl Hydro Oxy Yield

```1  1.00 1.00 0.51   1 0.17 0.75 1.5
2  0.50 0.95 1.00   1 0.09 0.94 0.4
3  0.50 1.00 0.17   1 0.44 0.90 1.8
4  1.00 0.98 1.00   1 0.10 0.89 4.5
5  0.13 0.84 0.73   1 0.16 0.84 0.4
6  0.50 0.90 0.91   1 0.22 0.84 0.4
7  0.13 0.75 1.00   1 0.14 0.86 0.2
8  0.13 0.84 0.75   1 0.10 0.83 0.3
9  0.13 0.78 0.97   1 0.06 0.84 0.5
10 0.13 0.87 0.70   1 0.45 0.85 1.0
11 1.00 0.77 1.00   1 0.19 0.86 1.5
12 1.00 0.94 0.81   1 0.47 0.86 3.0
13 1.00 0.93 1.00   1 0.45 0.89 2.5
14 0.50 1.00 1.00   1 0.54 0.84 4.0
15 0.50 1.00 1.00   1 0.25 0.88 2.2
16 1.00 1.00 0.56   1 0.25 0.90 5.0
17 1.00 0.90 0.56   1 0.40 0.90 1.5
18 0.50 0.97 1.00   1 0.22 0.95 1.0
19 0.54 0.96 1.00   1 0.18 0.91 0.3
20 1.00 0.97 0.33   1 0.39 0.90 3.0

```

And here the results of the fitted models so far.

summary(fit0)

Call:
lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl +

1. + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1) - 1, data = data.df)

Residuals:

Min 1Q Median 3Q Max -1.05485 -0.23759 0.01331 0.18692 1.23803

Coefficients:

Estimate Std. Error t value Pr(>|t|)

```log(Sal + 1)   5.07193    1.00584   5.042 1.98e-06 ***
log(Bathy + 1)  0.05804    0.20684   0.281 0.779561
log(Chl + 1)  -8.53941    2.11720  -4.033 0.000106 ***
log(Hydro + 1)  1.73835    0.28815   6.033 2.56e-08 ***
log(Oxy + 1)   4.19951    1.98459   2.116 0.036750 *
log(Sedi + 1)  1.15953    0.14807   7.831 4.51e-12 ***
```
```---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3735 on 103 degrees of freedom
Multiple R-Squared: 0.9148,     Adjusted R-squared: 0.9098
F-statistic: 184.2 on 6 and 103 DF,  p-value: < 2.2e-16

> summary(fit1)

Call:
lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl +
1) + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1), data = (data.df))

Residuals:
Min        1Q    Median        3Q       Max
-1.057766 -0.227720  0.006146  0.192200  1.222543

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)   -4.49400    4.76603  -0.943   0.3479
log(Sal + 1)   5.13499    1.00861   5.091 1.63e-06 ***
log(Bathy + 1)  0.06685    0.20716   0.323   0.7476
log(Chl + 1)  -2.48909    6.75718  -0.368   0.7134
log(Hydro + 1)  1.70674    0.29025   5.880 5.22e-08 ***
log(Oxy + 1)   4.63758    2.03928   2.274   0.0251 *
log(Sedi + 1)  1.14005    0.14959   7.621 1.34e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3737 on 102 degrees of freedom
Multiple R-Squared: 0.7589,     Adjusted R-squared: 0.7447
F-statistic: 53.51 on 6 and 102 DF,  p-value: < 2.2e-16

> summary(fit.0)

Call:
lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi +
1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi +
1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi +
1)) - 1, data = subset(data.df))

Residuals:
Min      1Q  Median      3Q     Max
-1.5762 -0.2591  0.1065  0.5023  1.4533

Coefficients:
Estimate Std. Error t value Pr(>|t|)
log((Sal + 1)/(Sedi + 1))    3.0170     1.5304   1.971  0.05134 .
log((Bathy + 1)/(Sedi + 1))  -0.3982     0.3139  -1.268  0.20748
log((Chl + 1)/(Sedi + 1))    8.8137     2.3941   3.681  0.00037 ***
log((Hydro + 1)/(Sedi + 1))   0.3309     0.4066   0.814  0.41760
log((Oxy + 1)/(Sedi + 1))  -12.5242     2.1881  -5.724 1.01e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.5766 on 104 degrees of freedom
Multiple R-Squared: 0.523,      Adjusted R-squared: 0.5001
F-statistic:  22.8 on 5 and 104 DF,  p-value: 2.179e-15

> summary(fit.1)

Call:
lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi +
1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi +
1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi +
1)), data = subset(data.df))

Residuals:
Min       1Q   Median       3Q      Max
-1.05552 -0.23441  0.01263  0.18355  1.24473

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)                  1.84916    0.15474  11.950  < 2e-16 ***
log((Sal + 1)/(Sedi + 1))    5.03863    1.00977   4.990 2.46e-06 ***
log((Bathy + 1)/(Sedi + 1))   0.05279    0.20767   0.254   0.7999
log((Chl + 1)/(Sedi + 1))  -10.96682    2.27270  -4.825 4.86e-06 ***
log((Hydro + 1)/(Sedi + 1))   1.74631    0.28980   6.026 2.64e-08 ***
log((Oxy + 1)/(Sedi + 1))    3.95938    1.98206   1.998   0.0484 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.3751 on 103 degrees of freedom
Multiple R-Squared: 0.5606,     Adjusted R-squared: 0.5393
F-statistic: 26.28 on 5 and 103 DF,  p-value: < 2.2e-16

> anova(fit1,fit0)

Analysis of Variance Table

Res.Df     RSS  Df Sum of Sq      F Pr(>F)
1    102 14.2409
2    103 14.3650  -1   -0.1241 0.8891 0.3479

Thanks for the help

Simone Vincenzi

_________________________________________
Simone Vincenzi, PhD Student
Department of Environmental Sciences
University of Parma
Parco Area delle Scienze, 33/A, 43100 Parma, Italy
Phone: +39 0521 905696
Fax: +39 0521 906611
e.mail: svincenz@nemo.unipr.it

-----Messaggio originale-----
Da: Spencer Graves [mailto:spencer.graves@pdf.com]
Inviato: lunedì 4 settembre 2006 21.31
A: Simone Vincenzi
Cc: r-help@stat.math.ethz.ch
Oggetto: Re: [R] Optimization

Have you considered talking logarithms of the expression you
mentioned:

log(Yield) = a1*log(A)+b1*log(B)+c2*log(C)+...

where a1 = a/(a+b+...), etc.  This model has two constraints not present
in ordinary least squares:  First, the intercept is assumed to be zero.
Second, the coefficients in this log formulation must sum to 1.  If I
were you, I might use something like "lm" to test them both.

To explain how, I'll modify the notation, replacing A by X1, B by
X2, ..., up to Xkm1 (= X[k-1]) and Xk for k different environmental
variables.  Then I might try something like the following:

fit0 <- lm(log(Yield) ~ log(X1) + ... + log(Xk)-1 )
fit1 <- lm(log(Yield) ~ log(X1) + ... + log(Xk) )
fit.1 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk) )
fit.0 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk)-1 )

anova(fit1, fit0) would test the no-constant model, and if I
haven't made a mistake in this, anova(fit0, fit.0) and anova(fit1,
fit.1) would test the constraint that all the coefficients should sum to
1.

If you would like further help from this listserve, please provide
commented, minimal, self-contained, reproducible code to help potential
respondents understand your question and concerns (as suggested in the
posting guide "www.R-project.org/posting-guide.html").

Hope this helps.
Spencer Graves

Simone Vincenzi wrote:

> Dear R-list,

> I'm trying to estimate the relative importance of 6 environmental
variables

> in determining clam yield. To estimate clam yield a previous work used the

> function Yield = (A^a*B^b*C^c...)^1/(a+b+c+...) where A,B,C... are the
> values of the environmental variables and the weights a,b,c... have not
been

> calibrated on data but taken from literature. Now I'd like to estimate the

> weights a,b,c... by using a dataset with 110 observations of yield and
> values of the environmental variables. I'm wondering if it is feasible or
if

> the number of observation is too low, if some data transformation is

needed

> and which R function is the most appropriate to try to estimate the

weights.

> Any help would be greatly appreciated.

>
> Simone Vincenzi
>
> _________________________________________
> Simone Vincenzi, PhD Student
> Department of Environmental Sciences
> University of Parma
> Parco Area delle Scienze, 33/A, 43100 Parma, Italy
> Phone: +39 0521 905696
> Fax: +39 0521 906611
> e.mail: svincenz@nemo.unipr.it
>
>
>
> --
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.
>

--
No virus found in this incoming message.

--

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help