From: David Winsemius <dwinsemius_at_comcast.net>

Date: Wed, 06 Apr 2011 17:20:52 -0400

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 21:53:53 GMT

Date: Wed, 06 Apr 2011 17:20:52 -0400

On Apr 6, 2011, at 9:46 AM, Fabrice Tourre wrote:

> Dear Henrique Dallazuanna,

*>
**> Thank you very much for your suggestion.
**>
**> It is obvious that your method is better than me.
**>
**> Is it possible to use cut, table,by etc? Whether there is some
**> aggregate function in R can do this?
**>
**> Thanks.
**>
**> On Wed, Apr 6, 2011 at 2:16 PM, Henrique Dallazuanna
**> <wwwhsd_at_gmail.com> wrote:
**>> Try this:
**>>
**>> fil <- sapply(ran, '<', e1 = dat[,1]) & sapply(ran[2:(length(ran) +
**>> 1)], '>=', e1 = dat[,1])
**>> mm <- apply(fil, 2, function(idx)mean(dat[idx, 2]))
**>>
**>> On Wed, Apr 6, 2011 at 5:48 AM, Fabrice Tourre <fabrice.ciup_at_gmail.com
**>> > wrote:
**>>> Dear list,
**>>>
**>>> I have a dataframe with two column as fellow.
**>>>
**>>>> head(dat)
**>>> V1 V2
**>>> 0.15624 0.94567
**>>> 0.26039 0.66442
**>>> 0.16629 0.97822
**>>> 0.23474 0.72079
**>>> 0.11037 0.83760
**>>> 0.14969 0.91312
**>>>
**>>> I want to get the column V2 mean value based on the bin of column of
**>>> V1. I write the code as fellow. It works, but I think this is not
**>>> the
**>>> elegant way. Any suggestions?
**>>>
**>>> dat<-read.table("dat.txt",head=F)
**>>> ran<-seq(0,0.5,0.05)
**>>> mm<-NULL
**>>> for (i in c(1:(length(ran)-1)))
**>>> {
**>>> fil<- dat[,1] > ran[i] & dat[,1]<=ran[i+1]
**>>> m<-mean(dat[fil,2])
**>>> mm<-c(mm,m)
**>>> }
**>>> mm
**>>>
**>>> Here is the first 20 lines of my data.
**>>>
**>>>> dput(head(dat,20))
**>>> structure(list(V1 = c(0.15624, 0.26039, 0.16629, 0.23474, 0.11037,
**>>> 0.14969, 0.16166, 0.09785, 0.36417, 0.08005, 0.29597, 0.14856,
**>>> 0.17307, 0.36718, 0.11621, 0.23281, 0.10415, 0.1025, 0.04238,
**>>> 0.13525), V2 = c(0.94567, 0.66442, 0.97822, 0.72079, 0.8376,
**>>> 0.91312, 0.88463, 0.82432, 0.55582, 0.9429, 0.78956, 0.93424,
**>>> 0.87692, 0.83996, 0.74552, 0.9779, 0.9958, 0.9783, 0.92523, 0.99022
**>>> )), .Names = c("V1", "V2"), row.names = c(NA, 20L), class =
**>>> "data.frame")
**>>>
**>>> ______________________________________________
*

Here is how I would have done it with findInterval and tapply which is very similar to using a `cut` and `table` approach:

> dat$grp <- findInterval(dat$V1, seq(0,0.5,0.05) ) > tapply(dat$V2, dat$grp, mean)

1 2 3 4 5 6 8 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900 0.6978900 #####---------------

You do not get exactly the same form of the result as with Henrique's
method. His yields:

> mm

[1] 0.9252300 0.8836100 0.9135429 0.9213600 0.8493450

0.7269900 NaN [8] 0.6978900 NaN NaN NaN ####----------------

The cut approach would yield this, which is more informatively labeled. (I'm wasn't completely sure the second to last word in the prior sentence was a real word, but several dictionaries seem to think so.):

> dat$grp2 <- cut(dat$V1 , breaks=ran)

> tapply(dat$V2, dat$grp2, mean)

(0,0.05] (0.05,0.1] (0.1,0.15] (0.15,0.2] (0.2,0.25] (0.25,0.3]
0.9252300 0.8836100 0.9135429 0.9213600 0.8493450 0.7269900
(0.3,0.35] (0.35,0.4] (0.4,0.45] (0.45,0.5]

NA 0.6978900 NA NA

*>
*

David Winsemius, MD

West Hartford, CT

R-help_at_r-project.org mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 06 Apr 2011 - 21:53:53 GMT

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.2.0, at Wed 06 Apr 2011 - 23:00:28 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*