Re: [R] Group averages

From: jim holtman <jholtman_at_gmail.com>
Date: Tue 13 Jun 2006 - 08:25:01 EST

Not exactly sure what you mean, but here is something that might be close. I used only a subset of your data to see it this is what you want. This computes the mean of all hpgpa, excluding that row:

> data[x[['2005.e']],] # subset of your data for yr=2005, conf='e'

     case hsgpa yr conf

73   3442 3.406104 2005    e
216  3017 4.071830 2005    e
284  3626 3.418870 2005    e
797  2184 3.459729 2005    e
881  3030 3.147831 2005    e
1030 9600 4.140025 2005    e
1071 1972 3.423202 2005    e
1100 8293 3.880199 2005    e
1219 5162 3.470179 2005    e
1276 5905 3.533801 2005    e
1312 3785 3.521670 2005    e
1363 8880 2.975047 2005    e
1426  123 3.070349 2005    e
1427  947       NA 2005    e
1475 3592 3.955794 2005    e

1635 366 3.172360 2005 e
1708 5257 3.612822 2005 e
1736 6256 NA 2005 e
1831 2112 3.719371 2005 e
1943 6528 3.322816 2005 e
1997 553 NA 2005 e
2208 2849 3.657016 2005 e
2240 6543       NA 2005    e
2360 9360       NA 2005    e

2611 4354 3.123671 2005 e
2659 1444 4.080455 2005 e
2704 9502 NA 2005 e
2714 8594 3.657861 2005    e
2732 4453 2.251620 2005    e
2778  875 3.913294 2005    e
2802 4022 3.970620 2005    e
2884 4473 3.650706 2005    e
2945  181 3.777851 2005    e
3059 6755 3.809683 2005    e
3327 8153       NA 2005    e
3380 3737 3.676996 2005    e
3404 4419 2.306697 2005    e
3577 3577 4.196025 2005    e
3608  457 4.150389 2005    e

3857 8642 3.220720 2005 e
3967 482 2.147233 2005 e
4122 4363 NA 2005 e
4185  651 4.087515 2005    e
4226  544 4.153056 2005    e
4362 1496 3.835143 2005    e
4475 1614 3.978524 2005    e
4680 6883 3.633342 2005    e
4739 5212       NA 2005    e
4843 3515 3.020855 2005    e

4867 2580 3.814048 2005 e
4887 7937 3.797753 2005 e
> y <- data[x[['2005.e']],]
> str(y)

`data.frame': 51 obs. of 4 variables:
 $ case : num  3442 3017 3626 2184 3030 ...
 $ hsgpa: num  3.41 4.07 3.42 3.46 3.15 ...
 $ yr   : num  2005 2005 2005 2005 2005 ...
 $ conf : chr  "e" "e" "e" "e" ...

> # compute the mean of all except the given row
> sapply(seq(nrow(y)), function(x) mean(y$hsgpa[-x],na.rm=TRUE))
 [1] 3.556268 3.540030 3.555956 3.554960 3.562567 3.538367 3.555851 3.544704 3.554705 3.553153
[11] 3.553449 3.566781 3.564457 3.552692 3.542861 3.561969 3.551226 3.552692 3.548627 3.558299
[21] 3.552692 3.550148 3.552692 3.552692 3.563156 3.539820 3.552692 3.550127 3.584426 3.543897
[31] 3.542499 3.550302 3.547201 3.546424 3.552692 3.549660 3.583082 3.537001 3.538114 3.560789
[41] 3.586972 3.552692 3.539648 3.538049 3.545803 3.542306 3.550725 3.552692 3.565664 3.546318
[51] 3.546715
> y$mean <- sapply(seq(nrow(y)), function(x) mean(y$hsgpa[-x],na.rm=TRUE))
> y

     case hsgpa yr conf mean

73   3442 3.406104 2005    e 3.556268
216  3017 4.071830 2005    e 3.540030
284  3626 3.418870 2005    e 3.555956
797  2184 3.459729 2005    e 3.554960
881  3030 3.147831 2005    e 3.562567
1030 9600 4.140025 2005    e 3.538367
1071 1972 3.423202 2005    e 3.555851
1100 8293 3.880199 2005    e 3.544704
1219 5162 3.470179 2005    e 3.554705
1276 5905 3.533801 2005    e 3.553153
1312 3785 3.521670 2005    e 3.553449
1363 8880 2.975047 2005    e 3.566781
1426  123 3.070349 2005    e 3.564457
1427  947       NA 2005    e 3.552692
1475 3592 3.955794 2005    e 3.542861
1635  366 3.172360 2005    e 3.561969
1708 5257 3.612822 2005    e 3.551226
1736 6256       NA 2005    e 3.552692
1831 2112 3.719371 2005    e 3.548627
1943 6528 3.322816 2005    e 3.558299
1997  553       NA 2005    e 3.552692
2208 2849 3.657016 2005    e 3.550148
2240 6543       NA 2005    e 3.552692
2360 9360       NA 2005    e 3.552692
2611 4354 3.123671 2005    e 3.563156
2659 1444 4.080455 2005    e 3.539820
2704 9502       NA 2005    e 3.552692
2714 8594 3.657861 2005    e 3.550127
2732 4453 2.251620 2005    e 3.584426
2778  875 3.913294 2005    e 3.543897
2802 4022 3.970620 2005    e 3.542499
2884 4473 3.650706 2005    e 3.550302
2945  181 3.777851 2005    e 3.547201
3059 6755 3.809683 2005    e 3.546424
3327 8153       NA 2005    e 3.552692
3380 3737 3.676996 2005    e 3.549660
3404 4419 2.306697 2005    e 3.583082
3577 3577 4.196025 2005    e 3.537001
3608  457 4.150389 2005    e 3.538114
3857 8642 3.220720 2005    e 3.560789
3967  482 2.147233 2005    e 3.586972
4122 4363       NA 2005    e 3.552692
4185  651 4.087515 2005    e 3.539648
4226  544 4.153056 2005    e 3.538049
4362 1496 3.835143 2005    e 3.545803
4475 1614 3.978524 2005    e 3.542306
4680 6883 3.633342 2005    e 3.550725
4739 5212       NA 2005    e 3.552692
4843 3515 3.020855 2005    e 3.565664

4867 2580 3.814048 2005 e 3.546318
4887 7937 3.797753 2005 e 3.546715
>

On 6/12/06, David Kling <klingd@reed.edu> wrote:
>
> Hello:
>
> I hope none of you will mind helping a newbie. I'm a student research
> assistant working with a large data set in which observations are
> categorized according to two factors. I'm trying to calculate the group
> mean and variance of a variable (called 'hsgpa' in the example data
> presented below) to each observation , excluding that observation. For
> example, if there are 20 observations with the same value of the two
> factors, for each of the 20 I'd like to generate the mean and variance
> of the 'hsgpa' values of the other 19 group members. This must be done
> for every observation in the data set.
>
> I've searched the R mail archives, read the manuals, and read
> documentation for tapply() andby() as well as summaryBy() in the 'doBy'
> package and with() from 'Hmisc.' It may be that since I'm new to
> writing functions and R is the first language I've ever worked with I'm
> less able to come up with a solution than some other new R users. None
> of the functions I have tried have been succesful, and it doesn't seem
> worth it to reproduce and explain my best effort. I hope someone has
> some ideas! Looking at what an experienced user would try should help
> me with my present task as well as future problems.
>
> Below I've included some lines that will generate a sample data set
> similar to the one I'm working with:
>
> #
> #Example data:
> #
> case <- sample(seq(1,10000,1),5000,replace=FALSE)
> hsgpa <- rbeta(5000,7,1.5)*4.25
> yr <- sample(seq(1993,2005,1),5000,replace=TRUE)
> conf <- sample(letters[1:5],5000,replace=TRUE)
> data <- data.frame(case=case,hsgpa=hsgpa,yr=yr,conf=conf)
> data$conf <- as.character(data$conf)
> s1 <- sample(seq(1,5000,1),500,replace=FALSE)
> k <- data$hsgpa
> k[row.names(data) %in% s1] <- NA
> data$hsgpa <- k
> s2 <- sample(seq(1,5000,1),100,replace=FALSE)
> k <- data$yr
> k[row.names(data) %in% s2] <- NA
> data$yr <- k
> k <- data$conf
> k[row.names(data) %in% s2] <- NA
> data$conf <- k
> remove(case,hsgpa,yr,conf,s1,s2,k)
> #
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
>
https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide!
> http://www.R-project.org/posting-guide.html
>

-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390 (Cell)
+1 513 247 0281 (Home)

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

______________________________________________
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Received on Tue Jun 13 08:28:29 2006

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.1.8, at Tue 13 Jun 2006 - 10:11:04 EST.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.