Re: [R] Can R replicate this data manipulation in SAS?

From: Ista Zahn <izahn_at_psych.rochester.edu>
Date: Wed, 20 Apr 2011 17:22:59 -0400

I think this is kind of like asking "will your Land Rover make it up my driveway?", but I'll assume the question was asked in all seriousness.

Here is one solution:

## **** Read in test data;

dat <- read.table(textConnection("id    drug      start       stop
1004    NRTI     07/24/95    01/05/99
1004    NRTI     11/20/95 12/10/95
1004    NRTI     01/10/96    01/05/99
1004    PI       05/09/96    11/16/97
1004    NRTI     06/01/96    02/01/97
1004    NRTI     07/01/96    03/01/97
9999    PI       01/02/03    NA

9999 NNRTI 04/05/06 07/08/09"), header=TRUE) closeAllConnections()

dat$start <- as.Date(dat$start, format = "%m/%d/%y") dat$stop <- as.Date(dat$stop, format = "%m/%d/%y")

## **** Reshape data into series with 1 date rather than separate starts and ## stops;

library(reshape)

m.dat <- melt(dat, id = c("id", "drug"))
m.dat <- m.dat[order(m.dat$id, m.dat$value),]
m.dat$variable <- ifelse(m.dat$variable == "start", 1, -1)
names(m.dat) <- c("id", "drug", "value", "date") m.dat

## **** Get regimen information plus start and stop dates;

n.dat <- cast(m.dat, id + date ~ drug, fun.aggregate=sum, margins="grand_col") for (i in names(n.dat)[-c(1:2)]) {

     n.dat[i] <- cumsum(n.dat[i])
   }
n.dat <- ddply(n.dat, .(id), transform,

      regimen = 1:length(id))
n.dat

ssd.dat <- ddply(n.dat, .(id), summarize,

                id = id[-1],
                regimen = regimen[-length(regimen)],
                 start_date = date[-length(date)],
                stop_date = date[-1])

ssd.dat

## **** Merge data to create regimens dataset; all.dat <- merge(n.dat[-2], ssd.dat)
all.dat <- all.dat[order(all.dat$id, all.dat$regimen), c("id", "start_date", "stop_date", "regimen", "NRTI", "NNRTI", "PI", "X.all.")]
all.dat

Best,
Ista

On Wed, Apr 20, 2011 at 2:59 PM, Ted Harding <ted.harding_at_wlandres.net> wrote:

> [*** PLEASE NOTE: I am sending this message on behalf of

>  Paul Miller:
>  Paul Miller <pjmiller_57_at_yahoo.com>
>  (to whom this message has also been copied). He has been
>  trying to send it, but it has never got through. Please
>  do  not reply to me, but either to the list and/or to Paul
>  at that address ***]
> ==========================================================
> Hello Everyone,
>
> I'm learning R and am trying to get a better sense of what it will and
> will not
> do. I'm hearing in some places that R may not be able to accomplish all
> of the
> data manipulation tasks that SAS can. In others, I'm hearing that R can do
> pretty much any data manipulation that SAS can but the way in which it
> does so
> is likely to be quite different.
>
> Below is some SAS syntax that that codes Highly Active Antiretroviral
> Therapy
> (HAART) regimens in HIV patients by retaining the values of variables.
> Interspersed between the bits of code are printouts of data sets that are
> created in the process of coding. I'm hoping this will come through
> clearly and
> that people will be able to see exactly what is being done. Basically,
> the code
> keeps track of how many drugs people are on and what types of drugs they
> are
> taking during specific periods of time and decides whether that
> constitutes
> HAART or not.
>
> To me, this is a pretty tricky data manipulation in SAS. Is there any way
> to
> get the equivalent result in R?
>
> Thanks,
>
> Paul
>
>
> **** SAS syntax for coding HAART in HIV patients;
> **** Read in test data;
>
> data haart;
> input id drug_class $ start_date :mmddyy. stop_date :mmddyy.;
> format start_date stop_date mmddyy8.;
> cards;
> 1004 NRTI  07/24/95 01/05/99
> 1004 NRTI  11/20/95 12/10/95
> 1004 NRTI  01/10/96 01/05/99
> 1004 PI    05/09/96 11/16/97
> 1004 NRTI  06/01/96 02/01/97
> 1004 NRTI  07/01/96 03/01/97
> 9999 PI    01/02/03 .
> 9999 NNRTI 04/05/06 07/08/09
> ;
> run;
>
> proc print data=haart;
> run;
>

>               drug_      start_       stop_
> Obs     id     class        date        date > 1     1004    NRTI     07/24/95    01/05/99 > 2     1004    NRTI     11/20/95 12/10/95 > 3     1004    NRTI     01/10/96    01/05/99 > 4     1004    PI       05/09/96    11/16/97 > 5     1004    NRTI     06/01/96    02/01/97 > 6     1004    NRTI     07/01/96    03/01/97 > 7     9999    PI       01/02/03           . > 8     9999    NNRTI    04/05/06    07/08/09 > > **** Reshape data into series with 1 date rather than separate starts and > stops; > > data changes (drop=start_date stop_date where=(not missing(date))); > set haart; > date = start_date; > change =  1; > output; > date =  stop_date; > change = -1; > output; > format date mmddyy10.; > run; > > proc sort data=changes; > by id date; > run; > > proc print data=changes; > run; >

>               drug_
> Obs     id     class          date    change

>  1    1004    NRTI     07/24/1995       1
>  2    1004    NRTI     11/20/1995       1
>  3    1004    NRTI     12/10/1995      -1
>  4    1004    NRTI     01/10/1996       1
>  5    1004    PI       05/09/1996       1
>  6    1004    NRTI     06/01/1996       1
>  7    1004    NRTI     07/01/1996       1
>  8    1004    NRTI     02/01/1997      -1
>  9    1004    NRTI     03/01/1997      -1
> 10    1004    PI       11/16/1997      -1 > 11    1004    NRTI     01/05/1999      -1 > 12    1004    NRTI     01/05/1999      -1 > 13    9999    PI       01/02/2003       1 > 14    9999    NNRTI    04/05/2006       1 > 15    9999    NNRTI    07/08/2009      -1 > > **** Get regimen information plus start and stop dates; > > data cumulative(drop=drug_class change stop_date)

>     stop_dates(keep=id regimen stop_date);
> set changes;
> by id date;
>
> if first.id then do;

>  regimen = 0;
>  NRTI = 0;
>  NNRTI = 0;
>  PI = 0;
> end;
>
> if drug_class = 'NNRTI' then NNRTI + change;
> else if drug_class = 'NRTI' then NRTI + change;
> else if drug_class = 'PI  ' then PI + change;
>
> if last.date then do;

>  stop_date = date - 1;
> if regimen then output stop_dates;

>   regimen + 1;
>  alldrugs = NNRTI + NRTI + PI;
>  HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR
>    (NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR
>    (NRTI = 1 AND NNRTI >= 1 AND PI >= 1);
> output cumulative;
> end;
>
> format stop_date mmddyy10.;
> run;
>
> proc print data=cumulative;
> run;
> Obs     id           date    regimen    NRTI    NNRTI    PI    alldrugs

>  HAART
>  1    1004    07/24/1995        1        1       0       0        1
>   0
>  2    1004    11/20/1995        2        2       0       0        2
>   0
>  3    1004    12/10/1995        3        1       0       0        1
>   0
>  4    1004    01/10/1996        4        2       0       0        2
>   0
>  5    1004    05/09/1996        5        2       0       1        3
>   1
>  6    1004    06/01/1996        6        3       0       1        4
>   1
>  7    1004    07/01/1996        7        4       0       1        5
>   1
>  8    1004    02/01/1997        8        3       0       1        4
>   1
>  9    1004    03/01/1997        9        2       0       1        3
>   1
> 10    1004    11/16/1997       10        2       0       0        2

>  0
> 11    1004    01/05/1999       11        0       0       0        0

>  0
> 12    9999    01/02/2003        1        0       0       1        1

>  0
> 13    9999    04/05/2006        2        0       1       1        2

>  0
> 14    9999    07/08/2009        3        0       0       1        1

>  0
>
> proc print data=stop_dates;
> run;
>
> Obs     id     regimen     stop_date

>  1    1004        1      11/19/1995
>  2    1004        2      12/09/1995
>  3    1004        3      01/09/1996
>  4    1004        4      05/08/1996
>  5    1004        5      05/31/1996
>  6    1004        6      06/30/1996
>  7    1004        7      01/31/1997
>  8    1004        8      02/28/1997
>  9    1004        9      11/15/1997
> 10    1004       10      01/04/1999 > 11    9999        1      04/04/2006 > 12    9999        2      07/07/2009 > > **** Merge data to create regimens dataset; > > data regimens; > retain id start_date stop_date; > merge cumulative(rename=(date=start_date)) stop_dates; > by id regimen; > if alldrugs; > run; > > proc print data=regimens; > run; > > Obs     id     start_date     stop_date    regimen    NRTI    NNRTI    PI > > alldrugs    HAART

>  1    1004    07/24/1995    11/19/1995        1        1       0       0
>

>  1         0
>  2    1004    11/20/1995    12/09/1995        2        2       0       0
>

>  2         0
>  3    1004    12/10/1995    01/09/1996        3        1       0       0
>

>  1         0
>  4    1004    01/10/1996    05/08/1996        4        2       0       0
>

>  2         0
>  5    1004    05/09/1996    05/31/1996        5        2       0       1
>

>  3         1
>  6    1004    06/01/1996    06/30/1996        6        3       0       1
>

>  4         1
>  7    1004    07/01/1996    01/31/1997        7        4       0       1
>

>  5         1
>  8    1004    02/01/1997    02/28/1997        8        3       0       1
>

>  4         1
>  9    1004    03/01/1997    11/15/1997        9        2       0       1
>

>  3         1
> 10    1004    11/16/1997    01/04/1999       10        2       0       0
>
> 2         0
> 11    9999    01/02/2003    04/04/2006        1        0       0       1
>
> 1         0
> 12    9999    04/05/2006    07/07/2009        2        0       1       1
>
> 2         0
> 13    9999    07/08/2009             .        3        0       0       1
>
> 1         0
>
> ==========================================================
>
> Paul Miller
> Paul Miller <pjmiller_57_at_yahoo.com>
>
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <ted.harding_at_wlandres.net>
> Fax-to-email: +44 (0)870 094 0861
> Date: 20-Apr-11                                       Time: 19:59:21
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help_at_r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Ista Zahn
Graduate student
University of Rochester
Department of Clinical and Social Psychology
http://yourpsyche.org

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
Received on Thu 21 Apr 2011 - 00:47:25 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Thu 21 Apr 2011 - 01:40:33 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive