# Re: [R] Data manipulation question

From: Peter Jepsen <PJ_at_dce.au.dk>
Date: Thu, 06 Nov 2008 13:11:40 +0100

Thank you for your prompt assistance, cruz and Bart.

Bart set me on the right track, and I modified his proposal to this:

f <- function(data){

```	m <- match(data\$stop,data\$start)
n <- min(length(m),which(is.na(m)))
data\$stop[n]
```

}
by(data,data\$id,f)

It also handles some special cases outside my small example dataset.

Thank you again!
Peter.

How about:

id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))

```start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
data <- data.frame(id,start,stop)

f <- function(data){
m <- match(data\$start,data\$stop) + 1
if (length(m)==1 && is.na(m)) m <- 1
if (length(m) > 1 && is.na(m[2])) m <- 1
data\$stop[min(m,na.rm=T)]
```

}

by(data,data\$id,f)

The if statements in the function are for some special cases, in all the other cases the firs line will do the trick. I would like to add that using data is a somewhat bad behavior, as this overwrites the build in data function of R. And I changed the way you made up the data.frame, as your method would convert everything to factors.

Good luck

Bart

Peter Jepsen wrote:
>
> Dear R-listers,
>
> I am a relatively inexperienced R-user currently migrating from Stata.
I
> am deeply frustrated by this data manipulation question: I know how I
> could do it in Stata, but I cannot make it work in R.
>
> I have a data frame of hospitalization data where each row represents
an
> admission. I need to know when patients were first discharged, but the
> problem is that patients were sometimes transferred between hospital
> departments. In my data a transfer looks like a new admission, except
> that it has a 'start' date equal to the previous admission's 'stop'
> date.
>
> Here is an example:
>
> id <- c(rep("a",4),rep("b",2), rep("c",5), rep("d",1))
> start <- c(c(0,6,17,20),c(0,1),c(0,5,10,11,50),c(0))
> stop <- c(c(6,12,20,30),c(1,10),c(3,10,11,30,55),c(6))
> data <- as.data.frame(cbind(id,start,stop))
> data
> # id start stop
> # 1 a 0 6
> # 2 a 6 12
> # 3 a 17 20
> # 4 a 20 30
> # 5 b 0 1
> # 6 b 1 10
> # 7 c 0 3
> # 8 c 5 10
> # 9 c 10 11
> # 10 c 11 30
> # 11 c 50 55
> # 12 d 0 6
>
> So, what I want to end up with is this:
>
> id start stop
> a 0 12 # This patient was transferred at time 6 and discharged
at
> time 12. The admission starting at time 17 is therefore irrelevant.
> b 0 10
> c 0 3
> d 0 6
>
> I have tried tons of variations over lapply, sapply, split, for etc.,
> all to no avail.
>
> Thank you in advance for any assistance.
>
> Best regards,
> Peter Jepsen, MD.
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

```--
View this message in context:
http://www.nabble.com/Data-manipulation-question-tp20356835p20358624.htm
l
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
```
Received on Thu 06 Nov 2008 - 12:15:00 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 06 Nov 2008 - 14:00:22 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.