# Re: [R] Rsquared for anova

From: Mike Marchywka <marchywka_at_hotmail.com>
Date: Sun, 17 Apr 2011 10:02:24 -0400

( did this msg make it through the lists as rich text? hotmail didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear if your issues are with R or stats or both. Usually the hard core stats people punt the stats questions to other places but both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I always like looking at new data. If you have one or a few dependent variable and many independent variable, it would probably help if you could visualize a surface with the response as a function of the input variables and then, maybe with the input of prior information or anecdotes, you have some idea what tests or analyses would make sense.

just some thoughts "for illustration only"

first it helps to make sure everything went ok and do quick checks, for example,

str(df)

```unique(df\$nh1)
unique(df\$nh2)
unique(df\$nh3)
unique(df\$randsize)
unique(df\$aweoghts)
unique(df\$aweights)

```

now personally lots of binary variable confuse me and I can munge them all together since I expect I can later identify issues in following plots. So, with this data you can create a composite variable like this, ( now I have not checked any of this for accuracy and typos and other problems may render the results useless)

x=df\$nh1+2*df\$nh2+4*df\$nh3+2*df\$randsize+32*df\$aweights df2<-cbind(df,x)
str(df2)

not sure if "time" was an input or output but you could see if there is any obvious trend or periodicity of time with your new made up variable,

plot(df2\$time,df2\$x)

Apparently x is a num rather than int, it can be changed for illustration but probably of no consequence,

xi=as.integer(x)
str(xi)

and then you can add color based on this varaiable,

min(xi)
c=rainbow(56)
cx=c[xi+1]
str(cx)

and make color coded scatter plots. Now, if you got lucky and guessed right you may see some patterns that you want to test,

plot(df2\$tos,df2\$tws,col=cx)

in this case, I get a cool red-yellow-green line along bottom ( very compelling linear fit question ) and scattered magenta( pink red? LOL ) and blue points everywhere with cluster near origin and nothing in top right quadrant. Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed in, if you just plot the red and green points the fit looks perfect for linear,

> good=which(df2\$x<20)
> plot(df2\$tos[good],df2\$tws[good],col=cx[good])

now if you look at results of fit of "Good" points vs all points, it isn't clear that anything like this would emerge from just looking at summaries of a linear fit,

td=df2\$tos[good]
ti=df2\$tws[good]
lm(td~ti)
lm(df2\$tos~ df2\$tws)
summary(lm(td~ti))
summary(lm(df2\$tos~ df2\$tws))

Now of course "tests" need to be considered ahead of time or else it is easy to go shopping for the answer you want. Anything post hoc needs to be very complete and you should at least try to rationalize test results you don't happen to like ( assuming you are trying to understand the system from which the data was measured rather than justify some particular outcome).

Date: Sun, 17 Apr 2011 11:34:14 +0200
From: dorien.herremans_at_ua.ac.be
To: dieter.menne_at_menne-biomed.de
CC: r-help_at_r-project.org
Subject: Re: [R] Rsquared for anova

So far, running it with alle the combinations did not take too long and there seem to be some effects between the parameters. However, 2x2 combinations might suffice.

Thanks for any help, or a pointer to some good documentation,

Dorien

On 16 April 2011 10:13, Dieter Menne <dieter.menne_at_menne-biomed.de> wrote:

>
> dorien wrote:
> >
> >> fit <- lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
> > data=expdata))
> > Error: unexpected ',' in "fit <-
> > lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,"
> >
> >
>
> Peter's point is the important one: too many interactions, and even with +
> instead of * you might be running into problems.
>
> But anyway: if you don't let us access
>
>
> /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP
>
> you cannot expect a better answer which will depend on the structure of the
> data set.
>
> Dieter
>
>
>
> --
> View this message in context:
> http://r.789695.n4.nabble.com/Rsquared-for-anova-tp3452399p3453719.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

```--
Dorien Herremans

*Department of Environment, Technology and Technology Management*
Faculty of Applied Economics
University of Antwerp

B.513
Prinsstraat 13
2000 Antwerp
Belgium
+32 3 265 41 25

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help