Re: [R] Looping through values in a data frame that are >zero

From: David Winsemius <dwinsemius_at_comcast.net>
Date: Sat, 21 May 2011 10:01:32 -0400

On May 21, 2011, at 9:12 AM, Dimitri Liakhovitski wrote:

> Hello!
>
> I've tried for a while - but can't figure it out. I have data frame x:
>
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
>
> I need to write a loop through all the cells in columns a,b,c that are
>> 0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in

apply(x[,3:5], 1, function(z) if(any(z >0) ){

                                   names(x)[2+which(z >0)]
                               } else {
                                   "none" })
[1] "b" "none" "a" "cc" "none"

> 2 The entry of column y that is in the same row

  apply(x, 1, function(z) if(any(z[3:5] >0) ){ z[1] } else { "none" }) [1] "a" "none" "c" "d" "none"

there might be pitfalls about which I am unaware since z will be coerced to a character vector. Generally the character comparisons with ">" will be "as expected" when the values were originally numeric.

 > ("-3" > 0)
[1] FALSE
 > ("0.1" > 0)
[1] TRUE
> 3 The entry of column z that is in the same row

  apply(x, 1, function(z) if(any(z[3:5] >0) ){ z[2] } else { "none" }) [1] "m" "none" "o" "p" "none"

If you want to use NA instead of "none" I don't foresee any problems.

-- 
David



> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
>
>
> To explain what I need it for eventually: I have a different data
> frame "large.df" that has the same columns (variables) - but with many
> more entries than "x". Something like:
> large.df<-expand.grid(y,z)
> names(large.df)<-c("y","z")
> set.seed(123)
> large.df$a<-sample(0:5,75,replace=T)
> set.seed(234)
> large.df$b<-sample(0:5,75,replace=T)
> set.seed(345)
> large.df$c<-sample(0:5,75,replace=T)
> large.df$y<-as.character(large.df$y)
> large.df$z<-as.character(large.df$z)
> large.df<-large.df[order(large.df$y,large.df$z),]
> row.names(large.df)<-1:nrow(large.df)
> (large.df);str(large.df)
>
> 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"].
> 2. Find all the corresponding cells in the large.df - in this case,
> it's:
> large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]
> and those 3 values can be found in rows 37:39 of large.df, in column
> "a".
> 3. Take those 3 values and add to them the corresponding value in x
> (in this case = 1) divided by their length (in this case = 3).
> 4. Do the same for the other cells in x that are >0.
>
> The final result will be (sorry for lengthy code):
>
> large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]+x[3,"a"]/3
> large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]+x[1,"b"]/3
> large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]+x[4,"c"]/3
> (large.df)
>
> (It just happens that at the end I divide by 3 - it could be anything
> that is length(large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]), etc.
>
>
> Thanks a lot for your suggestions!
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD West Hartford, CT ______________________________________________ R-help_at_r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Received on Sat 21 May 2011 - 14:03:13 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 21 May 2011 - 15:30:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive