From: Simone Vincenzi <svincenz_at_nemo.unipr.it>

Date: Wed 06 Sep 2006 - 00:10:34 EST

Date: Wed 06 Sep 2006 - 00:10:34 EST

Thanks for the help.

I thought about taking the logarithms of the expression but to cope with the
problem of zero values both in yield and the values of the environmental
variables another transformation is needed (I simply add one to the yield
value and to the environmental variables values). I follow the suggestion
kindly provided by Spencer Graves but I found the following problems, which
depend probably on my understanding:

1) I don't understand why it is needed to test the no-constant model (which
provides a better fitting of the model). Can I simply assume a no-constant
model?

2) Here comes the big problem. I don't understand what it is done with the
models fit.0 and fit.1. In both cases I have to leave out one variable from
the linear model but I don't understand why in this way I would test the
constraint that all the coefficients should sum to 1. And it is not possible
to apply anova(fit0,fit.0) and anova(fit1,fit.1) because I have different
response variables.

Sedi Sal Bathy Chl Hydro Oxy Yield

1 1.00 1.00 0.51 1 0.17 0.75 1.5 2 0.50 0.95 1.00 1 0.09 0.94 0.4 3 0.50 1.00 0.17 1 0.44 0.90 1.8 4 1.00 0.98 1.00 1 0.10 0.89 4.5 5 0.13 0.84 0.73 1 0.16 0.84 0.4 6 0.50 0.90 0.91 1 0.22 0.84 0.4 7 0.13 0.75 1.00 1 0.14 0.86 0.2 8 0.13 0.84 0.75 1 0.10 0.83 0.3 9 0.13 0.78 0.97 1 0.06 0.84 0.5 10 0.13 0.87 0.70 1 0.45 0.85 1.0 11 1.00 0.77 1.00 1 0.19 0.86 1.5 12 1.00 0.94 0.81 1 0.47 0.86 3.0 13 1.00 0.93 1.00 1 0.45 0.89 2.5 14 0.50 1.00 1.00 1 0.54 0.84 4.0 15 0.50 1.00 1.00 1 0.25 0.88 2.2 16 1.00 1.00 0.56 1 0.25 0.90 5.0 17 1.00 0.90 0.56 1 0.40 0.90 1.5 18 0.50 0.97 1.00 1 0.22 0.95 1.0 19 0.54 0.96 1.00 1 0.18 0.91 0.3 20 1.00 0.97 0.33 1 0.39 0.90 3.0

And here the results of the fitted models so far.

summary(fit0)

Call:

lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl +

- + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1) - 1, data = data.df)

Residuals:

Min 1Q Median 3Q Max -1.05485 -0.23759 0.01331 0.18692 1.23803

Coefficients:

Estimate Std. Error t value Pr(>|t|)

log(Sal + 1) 5.07193 1.00584 5.042 1.98e-06 *** log(Bathy + 1) 0.05804 0.20684 0.281 0.779561 log(Chl + 1) -8.53941 2.11720 -4.033 0.000106 *** log(Hydro + 1) 1.73835 0.28815 6.033 2.56e-08 *** log(Oxy + 1) 4.19951 1.98459 2.116 0.036750 * log(Sedi + 1) 1.15953 0.14807 7.831 4.51e-12 ***

--- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3735 on 103 degrees of freedom Multiple R-Squared: 0.9148, Adjusted R-squared: 0.9098 F-statistic: 184.2 on 6 and 103 DF, p-value: < 2.2e-16Received on Wed Sep 06 02:04:04 2006

> summary(fit1)

Call: lm(formula = log(Yield + 1) ~ log(Sal + 1) + log(Bathy + 1) + log(Chl + 1) + log(Hydro + 1) + log(Oxy + 1) + log(Sedi + 1), data = (data.df)) Residuals: Min 1Q Median 3Q Max -1.057766 -0.227720 0.006146 0.192200 1.222543 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -4.49400 4.76603 -0.943 0.3479 log(Sal + 1) 5.13499 1.00861 5.091 1.63e-06 *** log(Bathy + 1) 0.06685 0.20716 0.323 0.7476 log(Chl + 1) -2.48909 6.75718 -0.368 0.7134 log(Hydro + 1) 1.70674 0.29025 5.880 5.22e-08 *** log(Oxy + 1) 4.63758 2.03928 2.274 0.0251 * log(Sedi + 1) 1.14005 0.14959 7.621 1.34e-11 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3737 on 102 degrees of freedom Multiple R-Squared: 0.7589, Adjusted R-squared: 0.7447 F-statistic: 53.51 on 6 and 102 DF, p-value: < 2.2e-16

> summary(fit.0)

Call: lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi + 1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi + 1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi + 1)) - 1, data = subset(data.df)) Residuals: Min 1Q Median 3Q Max -1.5762 -0.2591 0.1065 0.5023 1.4533 Coefficients: Estimate Std. Error t value Pr(>|t|) log((Sal + 1)/(Sedi + 1)) 3.0170 1.5304 1.971 0.05134 . log((Bathy + 1)/(Sedi + 1)) -0.3982 0.3139 -1.268 0.20748 log((Chl + 1)/(Sedi + 1)) 8.8137 2.3941 3.681 0.00037 *** log((Hydro + 1)/(Sedi + 1)) 0.3309 0.4066 0.814 0.41760 log((Oxy + 1)/(Sedi + 1)) -12.5242 2.1881 -5.724 1.01e-07 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.5766 on 104 degrees of freedom Multiple R-Squared: 0.523, Adjusted R-squared: 0.5001 F-statistic: 22.8 on 5 and 104 DF, p-value: 2.179e-15

> summary(fit.1)

Call: lm(formula = log((Yield + 1)/(Sedi + 1)) ~ log((Sal + 1)/(Sedi + 1)) + log((Bathy + 1)/(Sedi + 1)) + log((Chl + 1)/(Sedi + 1)) + log((Hydro + 1)/(Sedi + 1)) + log((Oxy + 1)/(Sedi + 1)), data = subset(data.df)) Residuals: Min 1Q Median 3Q Max -1.05552 -0.23441 0.01263 0.18355 1.24473 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.84916 0.15474 11.950 < 2e-16 *** log((Sal + 1)/(Sedi + 1)) 5.03863 1.00977 4.990 2.46e-06 *** log((Bathy + 1)/(Sedi + 1)) 0.05279 0.20767 0.254 0.7999 log((Chl + 1)/(Sedi + 1)) -10.96682 2.27270 -4.825 4.86e-06 *** log((Hydro + 1)/(Sedi + 1)) 1.74631 0.28980 6.026 2.64e-08 *** log((Oxy + 1)/(Sedi + 1)) 3.95938 1.98206 1.998 0.0484 * --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3751 on 103 degrees of freedom Multiple R-Squared: 0.5606, Adjusted R-squared: 0.5393 F-statistic: 26.28 on 5 and 103 DF, p-value: < 2.2e-16

> anova(fit1,fit0)

Analysis of Variance Table Res.Df RSS Df Sum of Sq F Pr(>F) 1 102 14.2409 2 103 14.3650 -1 -0.1241 0.8891 0.3479 Thanks for the help Simone Vincenzi _________________________________________ Simone Vincenzi, PhD Student Department of Environmental Sciences University of Parma Parco Area delle Scienze, 33/A, 43100 Parma, Italy Phone: +39 0521 905696 Fax: +39 0521 906611 e.mail: svincenz@nemo.unipr.it -----Messaggio originale----- Da: Spencer Graves [mailto:spencer.graves@pdf.com] Inviato: luned́ 4 settembre 2006 21.31 A: Simone Vincenzi Cc: r-help@stat.math.ethz.ch Oggetto: Re: [R] Optimization Have you considered talking logarithms of the expression you mentioned: log(Yield) = a1*log(A)+b1*log(B)+c2*log(C)+... where a1 = a/(a+b+...), etc. This model has two constraints not present in ordinary least squares: First, the intercept is assumed to be zero. Second, the coefficients in this log formulation must sum to 1. If I were you, I might use something like "lm" to test them both. To explain how, I'll modify the notation, replacing A by X1, B by X2, ..., up to Xkm1 (= X[k-1]) and Xk for k different environmental variables. Then I might try something like the following: fit0 <- lm(log(Yield) ~ log(X1) + ... + log(Xk)-1 ) fit1 <- lm(log(Yield) ~ log(X1) + ... + log(Xk) ) fit.1 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk) ) fit.0 <- lm(log(Yield/Xk) ~ log(X1/Xk) + ... + log(Xkm1/Xk)-1 ) anova(fit1, fit0) would test the no-constant model, and if I haven't made a mistake in this, anova(fit0, fit.0) and anova(fit1, fit.1) would test the constraint that all the coefficients should sum to 1. If you would like further help from this listserve, please provide commented, minimal, self-contained, reproducible code to help potential respondents understand your question and concerns (as suggested in the posting guide "www.R-project.org/posting-guide.html"). Hope this helps. Spencer Graves Simone Vincenzi wrote:

> Dear R-list,

> I'm trying to estimate the relative importance of 6 environmental

variables

> in determining clam yield. To estimate clam yield a previous work used the

> function Yield = (A^a*B^b*C^c...)^1/(a+b+c+...) where A,B,C... are the> values of the environmental variables and the weights a,b,c... have not

been

> calibrated on data but taken from literature. Now I'd like to estimate the

> weights a,b,c... by using a dataset with 110 observations of yield and> values of the environmental variables. I'm wondering if it is feasible or

if

> the number of observation is too low, if some data transformation is

needed

> and which R function is the most appropriate to try to estimate the

weights.

> Any help would be greatly appreciated.

>> Simone Vincenzi>> _________________________________________> Simone Vincenzi, PhD Student> Department of Environmental Sciences> University of Parma> Parco Area delle Scienze, 33/A, 43100 Parma, Italy> Phone: +39 0521 905696> Fax: +39 0521 906611> e.mail: svincenz@nemo.unipr.it>>>> -->> ______________________________________________> R-help@stat.math.ethz.ch mailing list> https://stat.ethz.ch/mailman/listinfo/r-help> PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

> and provide commented, minimal, self-contained, reproducible code.>

-- No virus found in this incoming message. -- ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 06 Sep 2006 - 15:25:05 EST.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*