Re: [R] Rsquared for anova

From: Mike Marchywka <>
Date: Sun, 17 Apr 2011 10:02:24 -0400

( did this msg make it through the lists as rich text? hotmail didn't seem to think it was plain text?)

Anyway, having come in in the middle of this it isn't clear if your issues are with R or stats or both. Usually the hard core stats people punt the stats questions to other places but both can be addressed somewhat.
In any case, exploratory work is a good way to learn both and I always like looking at new data. If you have one or a few dependent variable and many independent variable, it would probably help if you could visualize a surface with the response as a function of the input variables and then, maybe with the input of prior information or anecdotes, you have some idea what tests or analyses would make sense.

just some thoughts "for illustration only"


first it helps to make sure everything went ok and do quick checks, for example,



now personally lots of binary variable confuse me and I can munge them all together since I expect I can later identify issues in following plots. So, with this data you can create a composite variable like this, ( now I have not checked any of this for accuracy and typos and other problems may render the results useless)

x=df$nh1+2*df$nh2+4*df$nh3+2*df$randsize+32*df$aweights df2<-cbind(df,x)

not sure if "time" was an input or output but you could see if there is any obvious trend or periodicity of time with your new made up variable,


Apparently x is a num rather than int, it can be changed for illustration but probably of no consequence,


and then you can add color based on this varaiable,


and make color coded scatter plots. Now, if you got lucky and guessed right you may see some patterns that you want to test,


in this case, I get a cool red-yellow-green line along bottom ( very compelling linear fit question ) and scattered magenta( pink red? LOL ) and blue points everywhere with cluster near origin and nothing in top right quadrant. Also note a few blues lines above the red-green-yellow line but much shorter.

And in fact, presumably you already knew this as it looks like it was designed in, if you just plot the red and green points the fit looks perfect for linear,

> good=which(df2$x<20)
> plot(df2$tos[good],df2$tws[good],col=cx[good])

now if you look at results of fit of "Good" points vs all points, it isn't clear that anything like this would emerge from just looking at summaries of a linear fit,

lm(df2$tos~ df2$tws)
summary(lm(df2$tos~ df2$tws))

Now of course "tests" need to be considered ahead of time or else it is easy to go shopping for the answer you want. Anything post hoc needs to be very complete and you should at least try to rationalize test results you don't happen to like ( assuming you are trying to understand the system from which the data was measured rather than justify some particular outcome).

Date: Sun, 17 Apr 2011 11:34:14 +0200
Subject: Re: [R] Rsquared for anova

Thanks for your remarks. I've been reading about R for the last two days, but I don't really get when I should use lm or aov.  

I have attached the dataset, feel free to take a look at it.  

So far, running it with alle the combinations did not take too long and there seem to be some effects between the parameters. However, 2x2 combinations might suffice.  

Thanks for any help, or a pointer to some good documentation,  


On 16 April 2011 10:13, Dieter Menne <> wrote:  

> dorien wrote:
> >
> >> fit <- lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,
> > data=expdata))
> > Error: unexpected ',' in "fit <-
> > lm((tos~nh1*nh2*nh3*randsize*aweights*tt1*tt2*tt3*iters*length,"
> >
> >
> Peter's point is the important one: too many interactions, and even with +
> instead of * you might be running into problems.
> But anyway: if you don't let us access
> /home/dorien/UA/meta-music/optimuse/optimuse1-build-desktop/results/results_processedCP
> you cannot expect a better answer which will depend on the structure of the
> data set.
> Dieter
> --
> View this message in context:
> Sent from the R help mailing list archive at
> ______________________________________________
> mailing list
> PLEASE do read the posting guide
> and provide commented, minimal, self-contained, reproducible code.

Dorien Herremans
*Department of Environment, Technology and Technology Management*
Faculty of Applied Economics
University of Antwerp
Prinsstraat 13
2000 Antwerp
+32 3 265 41 25

______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code. 		 	   		  
______________________________________________ mailing list
PLEASE do read the posting guide
and provide commented, minimal, self-contained, reproducible code.
Received on Sun 17 Apr 2011 - 14:05:41 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 17 Apr 2011 - 15:10:31 GMT.

Mailing list information is available at Please read the posting guide before posting to the list.

list of date sections of archive