Re: [R] subsetting data frame using by() or tapply() or other

From: Brian S Cade <brian_cade_at_usgs.gov>
Date: Fri 14 Oct 2005 - 08:54:02 EST


My thanks to Marc Schwartz who provided the solution - put unlist() around the tapply() statement. Looks like it works.

Brian

Brian S. Cade

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818

email: brian_cade@usgs.gov
tel: 970 226-9326

"Marc Schwartz (via MN)" <mschwartz@mn.rr.com> 10/13/2005 03:04 PM
Please respond to
mschwartz@mn.rr.com

To
Brian S Cade <brian_cade@usgs.gov>
cc
r-help@stat.math.ethz.ch
Subject
Re: [R] subsetting data frame using by() or tapply() or other

On Thu, 2005-10-13 at 14:28 -0600, Brian S Cade wrote:
> Ok so I see the problem that I'm having creating a new variable
(LAG1DBC)
> in the example data transformation below is that tapply() is creating a
> list that is not dimensionally consistent with the data frame (data). So

> how do I go from the list output of tapply() to create a dimensionally
> consistent vector that can create the new variable in my original data
> frame? I've been trying to use a function like
> data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA,
> x[-length(x)]))
> which creates a list of dimension much smaller than the nrows in data.
And
> I've tried things like using as.data.frame.array() or
as.data.frame.list()
> in front of tapply() and still have the same problem. I know this can't

> be that unusual of a data manipulation and that someone has to have done

> similar things before.
>
> I want to go from something like this:
>
> LOCID POPULATION YEAR DBC
> 1 algb-1 A 1992 0.70451575
> 2 algb-1 A 1993 0.59506851
> 3 algb-1 A 1997 0.84837544
> 4 algb-1 A 1998 0.50283182
> 5 algb-1 A 2000 0.91242707
> 6 algb-2 A 1992 0.09747155
> 7 algb-2 A 1993 0.84772253
> 8 algb-2 A 1997 0.43974081
> 9 algb-2 A 1998 0.83108544
> 10 algb-2 A 2000 0.22291192
> 11 algb-3 A 1992 0.44234175
> 12 algb-3 A 1993 0.54089534
> 5680 taylr-73 B 2001 0.43918082
> 5681 taylr-73 B 2002 0.34694427
> 5682 taylr-73 B 2003 3.35619190
> 5683 taylr-73 B 2004 0.71575815
> 5684 taylr-73 B 2005 0.42038506
> 5685 taylr-74 B 1992 3.88410354
> 5686 taylr-74 B 1993 3.32472557
> 5687 taylr-74 B 1994 3.29861501
> 5688 taylr-74 B 1996 0.48153827
> 5689 taylr-74 B 1997 3.63570636
> 5690 taylr-74 B 1998 1.94630194
>
> to something like this:
>
> LOCID POPULATION YEAR DBC LAG1DBC
> 1 algb-1 A 1992 0.70451575 NA
> 2 algb-1 A 1993 0.59506851 0.70451575
> 3 algb-1 A 1997 0.84837544 0.59506851
> 4 algb-1 A 1998 0.50283182 0.84837544
> 5 algb-1 A 2000 0.91242707 0.50283182
> 6 algb-2 A 1992 0.09747155 NA
> 7 algb-2 A 1993 0.84772253 0.09747155
> 8 algb-2 A 1997 0.43974081 0.84772253
> 9 algb-2 A 1998 0.83108544 0.43974081
> 10 algb-2 A 2000 0.22291192 0.83108544
> 11 algb-3 A 1992 0.44234175 NA
> 12 algb-3 A 1993 0.54089534 0.44234175
> 5680 taylr-73 B 2001 0.43918082 NA
> 5681 taylr-73 B 2002 0.34694427 0.43918082
> 5682 taylr-73 B 2003 3.35619190 0.34694427
> 5683 taylr-73 B 2004 0.71575815 3.35619190
> 5684 taylr-73 B 2005 0.42038506 0.71575815
> 5685 taylr-74 B 1992 3.88410354 NA
> 5686 taylr-74 B 1993 3.32472557 3.88410354
> 5687 taylr-74 B 1994 3.29861501 3.32472557
> 5688 taylr-74 B 1996 0.48153827 3.29861501
> 5689 taylr-74 B 1997 3.63570636 0.48153827
> 5690 taylr-74 B 1998 1.94630194 3.63570636
>
> Brian

Brian,

Use unlist():

> data$LAG1DBC <- unlist(tapply(data$DBC, data$LOCID,

                         function(x) c(NA, x[-length(x)])))


> data
LOCID POPULATION YEAR DBC LAG1DBC 1 algb-1 A 1992 0.70451575 NA 2 algb-1 A 1993 0.59506851 0.70451575 3 algb-1 A 1997 0.84837544 0.59506851 4 algb-1 A 1998 0.50283182 0.84837544 5 algb-1 A 2000 0.91242707 0.50283182 6 algb-2 A 1992 0.09747155 NA 7 algb-2 A 1993 0.84772253 0.09747155 8 algb-2 A 1997 0.43974081 0.84772253 9 algb-2 A 1998 0.83108544 0.43974081 10 algb-2 A 2000 0.22291192 0.83108544 11 algb-3 A 1992 0.44234175 NA 12 algb-3 A 1993 0.54089534 0.44234175 5680 taylr-73 B 2001 0.43918082 NA 5681 taylr-73 B 2002 0.34694427 0.43918082 5682 taylr-73 B 2003 3.35619190 0.34694427 5683 taylr-73 B 2004 0.71575815 3.35619190 5684 taylr-73 B 2005 0.42038506 0.71575815 5685 taylr-74 B 1992 3.88410354 NA 5686 taylr-74 B 1993 3.32472557 3.88410354 5687 taylr-74 B 1994 3.29861501 3.32472557 5688 taylr-74 B 1996 0.48153827 3.29861501 5689 taylr-74 B 1997 3.63570636 0.48153827 5690 taylr-74 B 1998 1.94630194 3.63570636

HTH, Marc Schwartz

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 14 09:12:14 2005

This archive was generated by hypermail 2.1.8 : Sun 23 Oct 2005 - 18:51:50 EST