Re: [R] Looping through values in a data frame that are >zero

From: Bert Gunter <gunter.berton_at_gene.com>
Date: Sat, 21 May 2011 07:40:12 -0700

Dmitri:

  1. I did not read your whole missive. I prefer mystery novels. ;-)
  2. I suggest you banish Excel language ("cells") from your vocabulary and think in R's terms of whole objects that one indexes into.
  3. If I understand correctly, you can't combine results into a data frame, because they would in general be of different lengths (whole object thinking).
  4. Again, if I understand correctly, this seems to be just a matter of indexing for which:

lapply(x[,c("a","b","c")], function(zz)x[zz>0, c("y","z")])

should do it.

HTH

On Sat, May 21, 2011 at 6:12 AM, Dimitri Liakhovitski <dimitri.liakhovitski_at_gmail.com> wrote:
> Hello!
>
> I've tried for a while - but can't figure it out. I have data frame x:
>
> y=c("a","b","c","d","e")
> z=c("m","n","o","p","r")
> a=c(0,0,1,0,0)
> b=c(2,0,0,0,0)
> c=c(0,0,0,4,0)
> x<-data.frame(y,z,a,b,c,stringsAsFactors=F)
> str(x)
> Some of the values in columns a,b, and c are >0:
>
> I need to write a loop through all the cells in columns a,b,c that are
>>0 (only through them).
> For each of those cells, I need to know:
> 1. Name of the column it is in
> 2 The entry of column y that is in the same row
> 3 The entry of column z that is in the same row
> It'd be good to save this info in a data frame somehow - so that I
> could loop through rows of this data frame.
>
>
> To explain what I need it for eventually: I have a different data
> frame "large.df" that has the same columns (variables) - but with many
> more entries than "x". Something like:
> large.df<-expand.grid(y,z)
> names(large.df)<-c("y","z")
> set.seed(123)
> large.df$a<-sample(0:5,75,replace=T)
> set.seed(234)
> large.df$b<-sample(0:5,75,replace=T)
> set.seed(345)
> large.df$c<-sample(0:5,75,replace=T)
> large.df$y<-as.character(large.df$y)
> large.df$z<-as.character(large.df$z)
> large.df<-large.df[order(large.df$y,large.df$z),]
> row.names(large.df)<-1:nrow(large.df)
> (large.df);str(large.df)
>
> 1. Find the first cell in x that is > 0 (in this case - it's x[3,"a"].
> 2. Find all the corresponding cells in the large.df - in this case, it's:
> large.df[large.df$y %in% "c" & large.df$z %in% "o","a"]
> and those 3 values can be found in rows 37:39 of large.df, in column "a".
> 3. Take those 3 values and add to them the corresponding value in x
> (in this case = 1) divided by their length (in this case = 3).
> 4. Do the same for the other cells in x that are >0.
>
> The final result will be (sorry for lengthy code):
>
> large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]<-large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]+x[3,"a"]/3
> large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]<-large.df[large.df$y %in% "a" & large.df$z %in%
> "m","b"]+x[1,"b"]/3
> large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]<-large.df[large.df$y %in% "d" & large.df$z %in%
> "p","c"]+x[4,"c"]/3
> (large.df)
>
> (It just happens that at the end I divide by 3 - it could be anything
> that is length(large.df[large.df$y %in% "c" & large.df$z %in%
> "o","a"]), etc.
>
>
> Thanks a lot for your suggestions!
>
>
> --
> Dimitri Liakhovitski
> Ninah Consulting
> www.ninah.com
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
"Men by nature long to get on to the ultimate truths, and will often
be impatient with elementary studies or fight shy of them. If it were
possible to reach the ultimate truths without the elementary studies
usually prefixed to them, these would not be preparatory studies but
superfluous diversions."

-- Maimonides (1135-1204)

Bert Gunter
Genentech Nonclinical Biostatistics
467-7374
http://devo.gene.com/groups/devo/depts/ncb/home.shtml

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Sat 21 May 2011 - 14:46:04 GMT

This quarter's messages: by month, or sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

All messages

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Sat 21 May 2011 - 14:50:08 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive