[R] Can R replicate this data manipulation in SAS?

From: Ted Harding <ted.harding_at_wlandres.net>
Date: Wed, 20 Apr 2011 19:59:28 +0100


[*** PLEASE NOTE: I am sending this message on behalf of  Paul Miller:
 Paul Miller <pjmiller_57_at_yahoo.com>
 (to whom this message has also been copied). He has been  trying to send it, but it has never got through. Please  do not reply to me, but either to the list and/or to Paul  at that address ***]



Hello Everyone,

I'm learning R and am trying to get a better sense of what it will and will not
do. I'm hearing in some places that R may not be able to accomplish all of the
data manipulation tasks that SAS can. In others, I'm hearing that R can do pretty much any data manipulation that SAS can but the way in which it does so
is likely to be quite different.

Below is some SAS syntax that that codes Highly Active Antiretroviral Therapy
(HAART) regimens in HIV patients by retaining the values of variables. Interspersed between the bits of code are printouts of data sets that are created in the process of coding. I'm hoping this will come through clearly and
that people will be able to see exactly what is being done. Basically, the code
keeps track of how many drugs people are on and what types of drugs they are
taking during specific periods of time and decides whether that constitutes
HAART or not.

To me, this is a pretty tricky data manipulation in SAS. Is there any way to
get the equivalent result in R?

Thanks,

Paul

data haart;
input id drug_class $ start_date :mmddyy. stop_date :mmddyy.; format start_date stop_date mmddyy8.;
cards;

1004 NRTI  07/24/95 01/05/99
1004 NRTI  11/20/95 12/10/95
1004 NRTI  01/10/96 01/05/99
1004 PI    05/09/96 11/16/97
1004 NRTI  06/01/96 02/01/97
1004 NRTI  07/01/96 03/01/97
9999 PI    01/02/03 .
9999 NNRTI 04/05/06 07/08/09

;
run;

proc print data=haart;
run;

               drug_      start_       stop_
Obs     id     class        date        date
1     1004    NRTI     07/24/95    01/05/99
2     1004    NRTI     11/20/95    12/10/95
3     1004    NRTI     01/10/96    01/05/99
4     1004    PI       05/09/96    11/16/97
5     1004    NRTI     06/01/96    02/01/97
6     1004    NRTI     07/01/96    03/01/97
7     9999    PI       01/02/03           .
8     9999    NNRTI    04/05/06    07/08/09

data changes (drop=start_date stop_date where=(not missing(date))); set haart;
date = start_date;
change = 1;
output;
date = stop_date;
change = -1;
output;
format date mmddyy10.;
run;

proc sort data=changes;
by id date;
run;

proc print data=changes;
run;

               drug_
Obs     id     class          date    change
  1    1004    NRTI     07/24/1995       1
  2    1004    NRTI     11/20/1995       1
  3    1004    NRTI     12/10/1995      -1
  4    1004    NRTI     01/10/1996       1
  5    1004    PI       05/09/1996       1
  6    1004    NRTI     06/01/1996       1
  7    1004    NRTI     07/01/1996       1
  8    1004    NRTI     02/01/1997      -1
  9    1004    NRTI     03/01/1997      -1
10    1004    PI       11/16/1997      -1
11    1004    NRTI     01/05/1999      -1
12    1004    NRTI     01/05/1999      -1
13    9999    PI       01/02/2003       1
14    9999    NNRTI    04/05/2006       1
15    9999    NNRTI    07/08/2009      -1

data cumulative(drop=drug_class change stop_date)

     stop_dates(keep=id regimen stop_date); set changes;
by id date;

if first.id then do;
  regimen = 0;
  NRTI = 0;
  NNRTI = 0;
  PI = 0;
end;

if drug_class = 'NNRTI' then NNRTI + change; else if drug_class = 'NRTI' then NRTI + change; else if drug_class = 'PI ' then PI + change;

if last.date then do;
  stop_date = date - 1;
if regimen then output stop_dates;

   regimen + 1;
  alldrugs = NNRTI + NRTI + PI;
  HAART = (NRTI >= 3 AND NNRTI=0 AND PI=0) OR     (NRTI >= 2 AND (NNRTI >= 1 OR PI >= 1)) OR     (NRTI = 1 AND NNRTI >= 1 AND PI >= 1); output cumulative;
end;

format stop_date mmddyy10.;
run;

proc print data=cumulative;
run;
Obs id date regimen NRTI NNRTI PI alldrugs  HAART
  1 1004 07/24/1995 1 1 0 0 1    0
  2 1004 11/20/1995 2 2 0 0 2    0
  3 1004 12/10/1995 3 1 0 0 1    0
  4 1004 01/10/1996 4 2 0 0 2    0
  5 1004 05/09/1996 5 2 0 1 3    1
  6 1004 06/01/1996 6 3 0 1 4    1
  7 1004 07/01/1996 7 4 0 1 5    1
  8 1004 02/01/1997 8 3 0 1 4    1
  9 1004 03/01/1997 9 2 0 1 3    1
10 1004 11/16/1997 10 2 0 0 2   0
11 1004 01/05/1999 11 0 0 0 0   0
12 9999 01/02/2003 1 0 0 1 1   0
13 9999 04/05/2006 2 0 1 1 2   0
14 9999 07/08/2009 3 0 0 1 1   0

proc print data=stop_dates;
run;

Obs     id     regimen     stop_date
  1    1004        1      11/19/1995
  2    1004        2      12/09/1995
  3    1004        3      01/09/1996
  4    1004        4      05/08/1996
  5    1004        5      05/31/1996
  6    1004        6      06/30/1996
  7    1004        7      01/31/1997
  8    1004        8      02/28/1997
  9    1004        9      11/15/1997
10    1004       10      01/04/1999
11    9999        1      04/04/2006
12    9999        2      07/07/2009

data regimens;
retain id start_date stop_date;
merge cumulative(rename=(date=start_date)) stop_dates; by id regimen;
if alldrugs;
run;

proc print data=regimens;
run;

Obs id start_date stop_date regimen NRTI NNRTI PI   

alldrugs HAART

  1    1004    07/24/1995    11/19/1995        1        1       0       0
     
 1         0
  2    1004    11/20/1995    12/09/1995        2        2       0       0
     
 2         0
  3    1004    12/10/1995    01/09/1996        3        1       0       0
     
 1         0
  4    1004    01/10/1996    05/08/1996        4        2       0       0
     
 2         0
  5    1004    05/09/1996    05/31/1996        5        2       0       1
     
 3         1
  6    1004    06/01/1996    06/30/1996        6        3       0       1
     
 4         1
  7    1004    07/01/1996    01/31/1997        7        4       0       1
     
 5         1
  8    1004    02/01/1997    02/28/1997        8        3       0       1
     
 4         1
  9    1004    03/01/1997    11/15/1997        9        2       0       1
     
 3         1
10    1004    11/16/1997    01/04/1999       10        2       0       0 
     
2         0
11    9999    01/02/2003    04/04/2006        1        0       0       1 
     
1         0
12    9999    04/05/2006    07/07/2009        2        0       1       1 
     
2         0
13    9999    07/08/2009             .        3        0       0       1 
     
1         0

==========================================================

Paul Miller
Paul Miller <pjmiller_57_at_yahoo.com>



E-Mail: (Ted Harding) <ted.harding_at_wlandres.net> Fax-to-email: +44 (0)870 094 0861
Date: 20-Apr-11                                       Time: 19:59:21
------------------------------ XFMail ------------------------------

______________________________________________
R-help_at_r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Received on Wed 20 Apr 2011 - 19:07:27 GMT

Archive maintained by Robert King, hosted by the discipline of statistics at the University of Newcastle, Australia.
Archive generated by hypermail 2.2.0, at Fri 22 Apr 2011 - 16:20:32 GMT.

Mailing list information is available at https://stat.ethz.ch/mailman/listinfo/r-help. Please read the posting guide before posting to the list.

list of date sections of archive