[R] Root mean square on binned GAM results

From: David Jarvis <thangalin_at_gmail.com>
Date: Fri, 18 Jun 2010 16:54:48 -0700


Standard correlations (Pearson's, Spearman's, Kendall's Tau) do not accurately reflect how closely the model (GAM) fits the data. I was told that the accuracy of the correlation can be improved using a root mean square deviation (RMSD) calculation on binned data.

For example, let 'o' be the real, observed data and 'm' be the model data. I believe I can calculate the root mean squared deviation as:

sqrt( mean( o - m ) ^ 2 )

However, this does not bin the data into mean sets. What I would like to do is:

oangry <- c( mean(o[1:5]), mean(o[6:10]), ... ) mangry <- c( mean(m[1:5]), mean(m[6:10]), ... )


sqrt( mean( oangry - mangry ) ^ 2 )

That calculation I would like to simplify into (or similar to):

sqrt( mean( bin( o, 5 ) - bin( m, 5 ) ) ^ 2 )

I have read the help for ?cut, ?table, ?hist, and ?split, but am stumped for which one to use in this case--if any.

How do you calculate c( mean(o[1:5]), mean(o[6:10]), ... ) for an arbitrary length vector using an appropriate number of bins (fixed at 5, or perhaps calculated using Sturges' formula)?

I have also posted a more detailed version of this question on StackOverflow:


Many thanks.


        [[alternative HTML version deleted]]

R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Fri 18 Jun 2010 - 23:57:22 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sun 20 Jun 2010 - 01:00:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive