Re: [R] allocating factor levels

From: Eric Lecoutre <ericlecoutre_at_gmail.com>
Date: Tue, 08 Mar 2011 09:59:16 +0100

Here is a version that should work for any number of values for Start.action The only requirement is that your data frame is sorted correctly, ie that subgroups are well defined.
Quite longer but I used it as an exercice to try an approch 'think generic" I guess there are a lot of better ways...

Kind regards,

Eric

x<- data.frame(Start.action = c(rep('Start.setting', 3),

				rep('Start.hauling', 4),
				rep('Start.setting', 4),
				rep('Start.hauling', 6),
				rep('Start.setting', 4),
				rep('Start.hauling', 4)))
switch=(as.character(x$Start.action)==c(as.character(x$Start.action[-1]),""))
switch <- !c(FALSE,switch)[1:length(switch)] cbind(x,switch)
spos=which(switch) # find position of first element of each group ind=cbind(spos,c(spos[-1]-1,nrow(x))) # build indices start-end of groups e=lapply(as.data.frame(t(ind)),FUN=function(a)seq(a[1],a[2])) # build whole indices (fill gaps using seq)
for (i in 1:length(e)){e[[i]] <- data.frame(ind=e[[i]],gr=names(e)[[i]])} e=do.call("rbind",e) ## prepare a column with unique groups names x <- cbind(x,gr1=e[,"gr"]) # add this colum to df gr2pos=table(x$Start.action,x$gr1) # associate with high levels groups names (Start.action)
a=apply(gr2pos,2,FUN=function(vec) which(vec!=0)) # use associations levels(x$gr)<-make.unique(rownames(gr2pos)[a]) # assign new names print(x)

On 08/03/2011, Dennis Murphy <djmuser_at_gmail.com> wrote:
> Hi:
>
> Here's one way to piece it together. All we need is the first variable, so
> I'll manufacture a vector of Start.action's and go from there.
>
> w <- data.frame(Start.action = c(rep('Start.setting', 3),
> rep('Start.hauling', 4),
> rep('Start.setting', 4),
> rep('Start.hauling', 6),
> rep('Start.setting', 4),
> rep('Start.hauling', 4)))
> wr <- rle(w$Start.action == 'Start.setting')
>> wr
> Run Length Encoding
> lengths: int [1:6] 3 4 4 6 4 4
> values : logi [1:6] TRUE FALSE TRUE FALSE TRUE FALSE
>
> w$cycle <- rep(cumsum(wr$values), wr$lengths)
> w$act <- ifelse(w$Start.action == 'Start.setting', 'set', 'haul')
> w$action <- with(w, paste(act, cycle, sep = ''))
> w$cycle <- w$act <- NULL
>> w
> Start.action action
> 1 Start.setting set1
> 2 Start.setting set1
> 3 Start.setting set1
> 4 Start.hauling haul1
> 5 Start.hauling haul1
> <snip>
> 20 Start.setting set3
> 21 Start.setting set3
> 22 Start.hauling haul3
> 23 Start.hauling haul3
> 24 Start.hauling haul3
> 25 Start.hauling haul3
>
> The rle() function is the key to this; given a logical statement as its
> argument, it is TRUE for Start.setting and FALSE for Start.hauling. The
> cumsum() function on the $values component of the result from rle() gives
> the values we want, and we replicate them according to the vector of
> $lengths given from rle. Once that is done, we just use a vectorized
> ifelse() function to yield 'set' or 'haul' in a new variable and then piece
> that together with the numeric vector...and we're done. Run the code one
> line at a time to understand what each instruction is doing.
>
> HTH,
> Dennis
>
> On Mon, Mar 7, 2011 at 7:13 PM, Darcy Webber <darcy.webber_at_gmail.com> wrote:
>
>> Dear R users,
>>
>> I am working on allocating the rows within a dataframe into some
>> factor levels.Consider the following dataframe:
>>
>> Start.action Start.time
>> 1 Start.setting 2010-12-30 17:58:00
>> 2 Start.setting 2010-12-30 18:40:00
>> 3 Start.setting 2010-12-31 22:39:00
>> 4 Start.setting 2010-12-31 23:24:00
>> 5 Start.setting 2011-01-01 00:30:00
>> 6 Start.setting 2011-01-01 01:10:00
>> 7 Start.hauling 2011-01-01 07:07:00
>> 8 Start.hauling 2011-01-01 14:25:00
>> 9 Start.hauling 2011-01-01 21:28:00
>> 10 Start.hauling 2011-01-02 03:38:00
>> 11 Start.hauling 2011-01-02 09:28:00
>> 12 Start.hauling 2011-01-02 14:22:00
>> 13 Start.setting 2011-01-02 20:51:00
>> 14 Start.setting 2011-01-02 21:33:00
>> 15 Start.setting 2011-01-02 22:47:00
>> 16 Start.setting 2011-01-02 23:27:00
>> 17 Start.setting 2011-01-03 00:35:00
>> 18 Start.setting 2011-01-03 01:16:00
>> 19 Start.hauling 2011-01-03 04:31:00
>> 20 Start.hauling 2011-01-03 08:57:00
>>
>> I am trying to assign a factor level like the one below (named
>> "action") according to the sequence of setting and hauling occuring in
>> the "Start.action" column. In fact, it wouldnt even need to be a
>> factor or character, it could simply be numbered (i.e., the set/haul
>> prefix is useless as I could simply split it afterwards).
>>
>> Start.action Start.time action
>> 1 Start.setting 2010-12-30 17:58:00 set1
>> 2 Start.setting 2010-12-30 18:40:00 set1
>> 3 Start.setting 2010-12-31 22:39:00 set1
>> 4 Start.setting 2010-12-31 23:24:00 set1
>> 5 Start.setting 2011-01-01 00:30:00 set1
>> 6 Start.setting 2011-01-01 01:10:00 set1
>> 7 Start.hauling 2011-01-01 07:07:00 haul1
>> 8 Start.hauling 2011-01-01 14:25:00 haul1
>> 9 Start.hauling 2011-01-01 21:28:00 haul1
>> 10 Start.hauling 2011-01-02 03:38:00 haul1
>> 11 Start.hauling 2011-01-02 09:28:00 haul1
>> 12 Start.hauling 2011-01-02 14:22:00 haul1
>> 13 Start.setting 2011-01-02 20:51:00 set2
>> 14 Start.setting 2011-01-02 21:33:00 set2
>> 15 Start.setting 2011-01-02 22:47:00 set2
>> 16 Start.setting 2011-01-02 23:27:00 set2
>> 17 Start.setting 2011-01-03 00:35:00 set2
>> 18 Start.setting 2011-01-03 01:16:00 set2
>> 19 Start.hauling 2011-01-03 04:31:00 haul2
>> 20 Start.hauling 2011-01-03 08:57:00 haul2
>>
>> It seems like such a simple question, yet I just cant think of how to
>> implement this. Any hints or ideas on how I might achieve this would
>> be much appreciated.
>>
>> Regards,
>> Darcy
>>
>> ______________________________________________
>> R-help_at_r-project.org mailing list
>>
https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Eric Lecoutre
Consultant - Business & Decision
Business Intelligence & Customer Intelligence

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Tue 08 Mar 2011 - 09:01:58 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Tue 08 Mar 2011 - 09:40:19 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive