[R] medians on data frame with duplicated rows

From: Adrian Johnson <oriolebaltimore_at_gmail.com>
Date: Wed, 02 Apr 2008 11:50:23 -0500


Dear list:

I have a data frame with student name, class he attended and marks for subjects he attended.
Students took second exam if they performed badly in their first attempt. I want to consider the test scores where they obtain highest median.

there are 6 classes. As a sample case, I will consider only classes A, B and C.

For class A, Student Raj took first exam and he did not score well. He re-took his exam and scored well. his median marks second time was high, and thus I will consider
only marks he scored second time. Similarly, for class A ,kiran took one class and I have no choice but to consider his marks in his only attempt.

Matt took class B and performed poorly. In his second attempt, he scored well and thus I will consider his second attempt because his median his high.

Since only one student took class C and took exam only once, I will consider Ram's test scores for the class he took.

    Name
 Class
 Trigno
 Algebra
 Calculus
 Sci.Comp
 CS
 Raj
 A
 10
 20
 12
 24
 20
 Raj
 A
 20
 21
 25
 25
 25
 Kiran
 A
 20
 24
 24
 23
 24
 Matt
 B
 12
 10
 13
 9
 9
 Matt
 B
 20
 24
 23
 22
 24
 Ram
 C
 25
 25
 25
 25
 25
 Rhea
 D
 21
 12
 12
 21
 23
 Jack
 E
 12
 15
 16
 18
 19
 Smita
 F
 13
 18
 19
 19
 20
> rtest

   Name Class Trigno Algebra Calculus Sci.Comp CS

1   Raj     A     10      20       12       24 20
2   Raj     A     20      21       25       25 25
3 Kiran     A     20      24       24       23 24
4  Matt     B     12      10       13        9  9
5  Matt     B     20      24       23       22 24
6   Ram     C     25      25       25       25 25
7  Rhea     D     21      12       12       21 23
8  Jack     E     12      15       16       18 19
9 Smita     F     13      18       19       19 20


I want to write a small loop program that would give me back a data frame.

>tclass <- c('A','B','C')
> tclass
[1] "A" "B" "C"

> for(i in 1:length(tclass)){
+ k = rtest[rtest$Class==tclass[i],]
+ if(nrow(k)>1){
+ x <- k[,c(-1,-2)]
+ rownames(x) <- k[,1]
+ k2 = apply(x,1,median)
+ fin_dat <- c(rtest$Name==max(names(k2)),)}
+ else{
+ fin_dat <- c(k)
+ }
+ }

Error in `row.names<-.data.frame`(`*tmp*`, value = c(4L, 4L, 2L)) :   duplicate 'row.names' are not allowed

my code:

for(i in 1:length(tclass)){

 k = rtest[rtest$Class==tclass[i],] # selecting all students who took that class

 if(nrow(k)>1){                           # if there are many students
then...
 x <- k[,c(-1,-2)]                         # take only their scores
and leave their name and class column
 rownames(x) <- k[,1]              # to choose row of scores for a student
with high median, I am making row names and since the student names are duplicated.. I am having problem
 k2 = apply(x,1,median) # obtain median  fin_dat <- c(rtest$Name==max(names(k2)),)} # get test scores line from original data frame and write back to fin_dat object
 else{                                   # if a student took only once his
exam
 fin_dat <- c(k)                   # write his score to fin_dat
  }
}

Could any one please help me here with my problem. Since Raj is duplicated, I am having this error. If there is a better way of doing this, I appreciate your help.

Thanks
Ad.

        [[alternative HTML version deleted]]



R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 02 Apr 2008 - 16:53:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Wed 02 Apr 2008 - 17:30:26 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive