Re: [R] Data manipulation question

From: Peter Jepsen <PJ_at_dce.au.dk>
Date: Thu, 06 Nov 2008 13:11:40 +0100

Thank you for your prompt assistance, cruz and Bart.

Bart set me on the right track, and I modified his proposal to this:

f <- function(data){

	m <- match(data$stop,data$start) 
	n <- min(length(m),which(is.na(m)))
	data$stop[n]

}
by(data,data$id,f)

It also handles some special cases outside my small example dataset.

Thank you again!
Peter.

-----Original Message-----
From: r-help-bounces_at_r-project.org [mailto:r-help-bounces_at_r-project.org] On Behalf Of bartjoosen
Sent: 6. november 2008 11:31
To: r-help_at_r-project.org
Subject: Re: [R] Data manipulation question

How about:

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))

start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0)) 
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6)) 
data <- data.frame(id,start,stop)

f <- function(data){
	m <- match(data$start,data$stop) + 1
	if (length(m)==1 && is.na(m)) m <- 1 
	if (length(m) > 1 && is.na(m[2])) m <- 1
	data$stop[min(m,na.rm=T)]

}

by(data,data$id,f)

The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors.

Good luck

Bart

Peter Jepsen wrote:
>
> Dear R-listers,
>
> I am a relatively inexperienced R-user currently migrating from Stata.
I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
>
> I have a data frame of hospitalization data where each row represents
an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
>
> Here is an example:
>
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> # id start stop
> # 1 a 0 6
> # 2 a 6 12
> # 3 a 17 20
> # 4 a 20 30
> # 5 b 0 1
> # 6 b 1 10
> # 7 c 0 3
> # 8 c 5 10
> # 9 c 10 11
> # 10 c 11 30
> # 11 c 50 55
> # 12 d 0 6
>
> So, what I want to end up with is this:
>
> id start stop
> a 0 12 # This patient was transferred at time 6 and discharged
at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b 0 10
> c 0 3
> d 0 6
>
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail.
>
> Thank you in advance for any assistance.
>
> Best regards,
> Peter Jepsen, MD.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

-- 
View this message in context:
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm
l
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 06 Nov 2008 - 12:15:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Nov 2008 - 14:00:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive