From: Denis Chabot <chabotd_at_globetrotter.net>

Date: Wed 13 Sep 2006 - 17:06:38 GMT

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 14 03:15:32 2006

Date: Wed 13 Sep 2006 - 17:06:38 GMT

Thank you Gabor,

I'll need to explore a bit the reshape package to see what benefits I get compared with the basic "reshape" function, but I'm glad you made me aware of it.

And your solution for fixing NAs just for the columns I want is just what I wanted.

Many thanks,

Denis

Le 06-09-13 à 00:55, Gabor Grothendieck a écrit :

> I missed your second question which was how to set the NAs to zero

*> for some of the columns. Suppose we want to replace the NAs
**> in columns ic and for sake of example suppose ic specifies
**> columns 1 to 8:
**>
**> library(reshape)
**> testm <- melt(test, id = 1:6)
**> out <- cast(testm, nbpc + trip + set + tagno + depth ~ prey, sum)
**>
**> # fix up NAs
**> ic <- 1:8
**> out2 <- out[,ic]
**> out2[is.na(out2)] <- 0
**> out[,ic] <- out2
**>
**> On 9/13/06, Gabor Grothendieck <ggrothendieck@gmail.com> wrote:
**>> If I understand this correctly we want to sum the mass over each
**>> combination
**>> of the first 6 variables and display the result with the 6th, prey,
**>> along the top and the others along the side.
**>>
**>> library(reshape)
**>> testm <- melt(test, id = 1:6)
**>> cast(testm, nbpc + trip + set + tagno + depth ~ prey)
**>>
**>> Now fix up the NAs.
**>>
**>> On 9/12/06, Denis Chabot <chabotd@globetrotter.net> wrote:
**>> > Hi,
**>> >
**>> > I'm trying to move to R the last few data handling routines I was
**>> > performing in SAS.
**>> >
**>> > I'm working on stomach content data. In the simplified example I
**>> > provide below, there are variables describing the origin of each
**>> prey
**>> > item (nbpc is a ship number, each ship may have been used on
**>> > different trips, each trip has stations, and individual fish
**>> (tagno)
**>> > can be caught at each station.
**>> >
**>> > For each stomach the number of lines corresponds to the number of
**>> > prey items. Thus a variable identifies prey type, and others (here
**>> > only one, mass) provide information on prey abundance or size or
**>> > digestion level.
**>> >
**>> > Finally, there can be accompanying variables that are not used but
**>> > that I need to keep for later analyses (e.g. depth in the example
**>> > below).
**>> >
**>> > At some point I need to transform such a dataset into another
**>> format
**>> > where each stomach occupies a single line, and there are columns
**>> for
**>> > each prey item.
**>> >
**>> > The "reshape" function works really well, my program is in fact
**>> > simpler than the SAS equivalent (not shown, don't want to bore you,
**>> > but available on request), except that I need zeros when prey types
**>> > are absent from a stomach instead of NAs, a problem for which I
**>> only
**>> > have a shaky solution at the moment:
**>> >
**>> > 1) creation of a dummy dataset:
**>> > #######
**>> > nbpc <- rep(c(20,34), c(110,90))
**>> > trip <- c(rep(1:3, c(40, 40, 30)), rep(1:2, c(60,30)))
**>> > set <- c(rep(1:4, c(10, 8, 7, 15)), rep(c(10,12), c(25,15)), rep
**>> (1:3,
**>> > rep(10,3)),
**>> > rep(10:12, c(20, 10, 30)), rep(7:8, rep(15,2)))
**>> > depth <- c(rep(c(100, 150, 200, 250), c(10, 8, 7, 15)), rep(c
**>> > (100,120), c(25,15)), rep(c(75, 50, 200), rep(10,3)),
**>> > rep(c(200, 150, 50), c(20, 10, 30)), rep(c(100, 250), rep
**>> > (15,2)))
**>> > tagno <- rep(round(runif(42,1,200)),
**>> > c(7,3, 4,4, 2,2,3, 5,5,5, 4,6,4,3,5,3, 7,8, 4,6, 5,5,
**>> > 7,3,
**>> > 6,6,4,4, 4,6, 3,3,4,5,5,6,4, 5,5,5, 8,7))
**>> > prey.codes <-c(187, 438, 792, 811)
**>> > prey <- sample(prey.codes, 200, replace=T)
**>> > mass <- runif(200, 0, 10)
**>> >
**>> > test <- data.frame(nbpc, trip, set, depth, tagno, prey, mass)
**>> > ########
**>> >
**>> > Because there are often multiple occurrences of the same prey in a
**>> > single stomach, I need to sum them for each stomach before using
**>> > "reshape". Here I use summarizeBy because my understanding of the
**>> > many variants of "apply" is not very good:
**>> >
**>> > ########
**>> > test2 <- summaryBy(mass~nbpc+trip+set+tagno+prey, data=test,
**>> FUN=sum,
**>> > keep.names=T, id=~depth)
**>> >
**>> > #this messes up sorting order, I fix it
**>> > k <- order(test2$nbpc, test2$trip, test2$set, test2$tagno)
**>> > test3 <- test2[k,]
**>> > result <- reshape(test3, v.names="mass", idvar=c("nbpc", "trip",
**>> > "set", "tagno"),
**>> > timevar="prey", direction="wide")
**>> > #########
**>> >
**>> > I'm quite happy with this, although you may know of better ways of
**>> > doing it.
**>> > But my problem is with preys that are absent from a stomach. In
**>> later
**>> > analyses, I need them to have zero abundance instead of NA.
**>> > My shaky solution is:
**>> > #########
**>> > empties <- is.na(result)
**>> > result[empties] <- 0
**>> > #########
**>> >
**>> > which did the job in this example, but it won't always. For
**>> instance
**>> > there could have been NAs for "depth", which I do not want to
**>> become
**>> > zero.
**>> >
**>> > Is there a way to transform NAs into zeros for multiple columns
**>> of a
**>> > dataframe in one step, while ignoring some columns?
**>> >
**>> > Or maybe there is another way to achieve this that would have put
**>> > zeros where I need them (i.e. something else than "reshape")?
**>> >
**>> > Thanking you in advance,
**>> >
**>> > Denis Chabot
**>> >
**>> > ______________________________________________
**>> > R-help@stat.math.ethz.ch mailing list
**>> > https://stat.ethz.ch/mailman/listinfo/r-help
**>> > PLEASE do read the posting guide http://www.R-project.org/
**>> posting-guide.html
**>> > and provide commented, minimal, self-contained, reproducible code.
**>> >
**>>
*

R-help@stat.math.ethz.ch mailing list

https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Thu Sep 14 03:15:32 2006

Archive maintained by Robert King, hosted by
the discipline of
statistics at the
University of Newcastle,
Australia.

Archive generated by hypermail 2.1.8, at Wed 13 Sep 2006 - 17:30:05 GMT.

*
Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help.
Please read the posting
guide before posting to the list.
*