From: David Winsemius <dwinsemius_at_comcast.net>

Date: Fri, 18 Jun 2010 23:08:42 -0400

[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5

[24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10

[47] 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14

[70] 14 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19

[93] 19 19 19 20 20 20 20 20

> tapply(x, rep(1:20, each=5), mean)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # this row is just indices

3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 # this row is the means

Date: Fri, 18 Jun 2010 23:08:42 -0400

On Jun 18, 2010, at 10:38 PM, David Jarvis wrote:

> Hi, David.

*>
**> accurately reflect how closely the model (GAM) fits the data. I was
**> told
**>
**> This was my presumption; I could be mistaken.
**>
**> that the accuracy of the correlation can be improved using a root mean
**> square deviation (RMSD) calculation on binned data.
**>
**> By whom? ... and with what theoretical basis?
**>
**> I talked with Christian Schunn. He mentioned that using RMSD would
**> produce a better result for goodness-of-fit (if that term is not
**> synonymous with correlation, I apologise -- I'm still rather new to
**> this level of statistics):
**>
**> http://www.lrdc.pitt.edu/schunn/gof/index.html
**>
**> It was regarding a chart similar to:
**>
**> http://i.imgur.com/X0gxV.png
**>
**> In the chart, the calculation for Pearson's, Spearman's, and
**> Kendall's Tau provide, in my opinion, an incorrect indicator as to
**> the strength of GAM's fit to the data. I could be wrong here, too.
**>
**> His suggestion was to use bin the means (in groups of 5 or so) to
**> reduce the noise.
**>
**> I doubt that your strategy offers any statistical advantage, but if
**> you want to play around with it then consider:
**>
**> binned.x <- round( (x + 2.5)/5)
**>
**> > d <-
**> c
**> (1,3,5,4,3,6,3,1,5,7,8,9,4,3,2,7,3,6,8,9,5,3,1,4,5,8,9,3,3,2,5,7,8,8,5,4,3,2,6,4,3,1,4,5,6,8,9,0,7,7,5,4,3,3,2,1,3,4,5,6,7,9,0,2,4,3,3
**> )
**> > binned.d <- round( (d + 2.5)/5)
**> > print(binned.d)
**> [1] 1 1 2 1 1 2 1 1 2 2 2 2 1 1 1 2 1 2 2 2 2 1 1 1 2 2 2 1 1 1 2 2
**> 2 2 2 1 1 1
**> [39] 2 1 1 1 1 2 2 2 2 0 2 2 2 1 1 1 1 1 1 1 2 2 2 2 0 1 1 1 1
**>
**> That doesn't make sense to me.
*

Then I blame your powers of exposition. Without some sort of explicit example the parsing of English is very prone to error. If you want to pick out elements of x in some pre-specified order in groups of five then consider:

> x <- 1:100 > > rep(1:20, each=5)

[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5

[24] 5 5 6 6 6 6 6 7 7 7 7 7 8 8 8 8 8 9 9 9 9 9 10

[47] 10 10 10 10 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 14

[70] 14 15 15 15 15 15 16 16 16 16 16 17 17 17 17 17 18 18 18 18 18 19 19

[93] 19 19 19 20 20 20 20 20

> tapply(x, rep(1:20, each=5), mean)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 # this row is just indices

3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98 # this row is the means

If you wanted them in random groups of roughly 5, then you could use sample(x, prob=rep(5/n, n/5))

> My impression was that I should try to put every 5 values in a bin,

*> average that bin, then calculate the RMSD between the observed
**> values and the values from GAM. In other words (o is observed and m
**> is model):
*

Do you intend that m[n] would be the predicted value from a model? How are you forming the groups of 5? Are they ordered? If so ordered by observed of by predicted? (In R a "model" is a complex list structure, but may in some cases have a simple predicted value for each case. Again a specific example might work wonders.

-- David.Received on Sat 19 Jun 2010 - 03:11:01 GMT

>

> bins <- 5

>> while( length(o) %% bins != 0 ) {> o <- o[-length(o)]> }> omean <- apply( matrix(o, bins), 2, mean )>> while( length(m) %% bins!= 0 ) {> m <- m[-length(m)]> }> mmean <- apply( matrix(m, bins), 2, mean )>> sqrt( mean( omean - mmean ) ^ 2 )>> But that feels sloppy, error prone, and fragile.>> Joris mentioned that I could try using tapply with> cut(d,round(length(d)/5)). I couldn't figure out how to get the> means back from the factors.>> Dave>

David Winsemius, MD West Hartford, CT ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Sat 19 Jun 2010 - 05:30:33 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*