[R] subsetting data frame using by() or tapply() or other

From: Brian S Cade <brian_cade_at_usgs.gov>
Date: Fri 14 Oct 2005 - 06:28:58 EST


Ok so I see the problem that I'm having creating a new variable (LAG1DBC) in the example data transformation below is that tapply() is creating a list that is not dimensionally consistent with the data frame (data). So how do I go from the list output of tapply() to create a dimensionally consistent vector that can create the new variable in my original data frame? I've been trying to use a function like data$LAG1DBC <- tapply(data$DBC, data$LOCID, function(x) c(NA, x[-length(x)]))
which creates a list of dimension much smaller than the nrows in data. And I've tried things like using as.data.frame.array() or as.data.frame.list() in front of tapply() and still have the same problem. I know this can't be that unusual of a data manipulation and that someone has to have done similar things before.

I want to go from something like this:

       LOCID  POPULATION  YEAR        DBC
1      algb-1           A 1992 0.70451575
2      algb-1           A 1993 0.59506851
3      algb-1           A 1997 0.84837544
4      algb-1           A 1998 0.50283182
5      algb-1           A 2000 0.91242707
6      algb-2           A 1992 0.09747155
7      algb-2           A 1993 0.84772253
8      algb-2           A 1997 0.43974081
9      algb-2           A 1998 0.83108544
10     algb-2           A 2000 0.22291192
11     algb-3           A 1992 0.44234175
12     algb-3           A 1993 0.54089534
5680 taylr-73           B 2001 0.43918082
5681 taylr-73           B 2002 0.34694427
5682 taylr-73           B 2003 3.35619190
5683 taylr-73           B 2004 0.71575815
5684 taylr-73           B 2005 0.42038506
5685 taylr-74           B 1992 3.88410354
5686 taylr-74           B 1993 3.32472557
5687 taylr-74           B 1994 3.29861501
5688 taylr-74           B 1996 0.48153827
5689 taylr-74           B 1997 3.63570636
5690 taylr-74           B 1998 1.94630194

to something like this:

       LOCID  POPULATION  YEAR        DBC LAG1DBC
1      algb-1           A 1992 0.70451575       NA 
2      algb-1           A 1993 0.59506851 0.70451575
3      algb-1           A 1997 0.84837544       0.59506851
4      algb-1           A 1998 0.50283182 0.84837544
5      algb-1           A 2000 0.91242707       0.50283182
6      algb-2           A 1992 0.09747155       NA
7      algb-2           A 1993 0.84772253 0.09747155
8      algb-2           A 1997 0.43974081       0.84772253
9      algb-2           A 1998 0.83108544       0.43974081
10     algb-2           A 2000 0.22291192       0.83108544
11     algb-3           A 1992 0.44234175       NA
12     algb-3           A 1993 0.54089534       0.44234175
5680 taylr-73           B 2001 0.43918082       NA
5681 taylr-73           B 2002 0.34694427       0.43918082
5682 taylr-73           B 2003 3.35619190       0.34694427
5683 taylr-73           B 2004 0.71575815       3.35619190
5684 taylr-73           B 2005 0.42038506       0.71575815
5685 taylr-74           B 1992 3.88410354       NA
5686 taylr-74           B 1993 3.32472557       3.88410354
5687 taylr-74           B 1994 3.29861501       3.32472557
5688 taylr-74           B 1996 0.48153827       3.29861501
5689 taylr-74           B 1997 3.63570636       0.48153827
5690 taylr-74           B 1998 1.94630194       3.63570636

Brian  

Brian S. Cade

U. S. Geological Survey
Fort Collins Science Center
2150 Centre Ave., Bldg. C
Fort Collins, CO 80526-8818

email: brian_cade@usgs.gov
tel: 970 226-9326

        [[alternative HTML version deleted]]



R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html Received on Fri Oct 14 06:34:28 2005

This archive was generated by hypermail 2.1.8 : Fri 03 Mar 2006 - 03:40:44 EST