I am trying to find the first date of the latest regimen in a group of treatments. I'm having a hard time wrapping my head around this. I have a feeling PROC SQL with a HAVING statement might be my best bet, but I'm not able to think of how to do this.
Suppose I have several subjects who have received various regimens of therapy where each regimen can contain multiple therapies. I want to find the latest date for each subject. Then, I need to find the earliest date in that corresponding regimen.
So, in the data below, Subject 101-001 has a latest date of 04JUL2014, which is Regimen #2. The earliest date in Regimen #2 is 10JUN2014. That would be the record I want to keep. Likewise, Subject 101-002 has a latest date of 21DEC2016, which is in Regimen #2. The earliest date in Regimen #2 is 15NOV2016. That would be the record I would want to keep.
I can't make any assumptions like regimens being in chronological order. It could very well happen that Regimen #2 happens a couple years before Regimen #1. Regimens are the order they are entered more so than when they occurred.
Subject Regimen Date
101-001 | 1 | 12-Dec-2013 |
101-001 | 1 | 21-Dec-2013 |
101-001 | 1 | 5-Jan-2014 |
101-001 | 2 | 10-Jun-2014 |
101-001 | 2 | 19-Jun-2014 |
101-001 | 2 | 4-Jul-2014 |
101-002 | 1 | 27-Sep-2016 |
101-002 | 1 | 1-Oct-2016 |
101-002 | 2 | 15-Nov-2016 |
101-002 | 2 | 21-Dec-2016 |
101-002 | 3 | 20-Nov-2016 |
101-002 | 3 | 1-Dec-2016 |
data have;
input Subject $ Regimen Date :date9.;
format date date9.;
cards;
101-001 1 12-Dec-2013
101-001 1 21-Dec-2013
101-001 1 5-Jan-2014
101-001 2 10-Jun-2014
101-001 2 19-Jun-2014
101-001 2 4-Jul-2014
101-002 1 27-Sep-2016
101-002 1 1-Oct-2016
101-002 2 15-Nov-2016
101-002 2 21-Dec-2016
101-002 3 20-Nov-2016
101-002 3 1-Dec-2016
;
proc sql;
create table want as
select a.*
from have a, (select subject,regimen as _regimen from have group by subject having date=max(date)) b
where a.subject=b.subject
group by a.subject,regimen,_regimen
having date=min(date) and _regimen=regimen;
quit;
data have;
input Subject $ Regimen Date :date9.;
format date date9.;
cards;
101-001 1 12-Dec-2013
101-001 1 21-Dec-2013
101-001 1 5-Jan-2014
101-001 2 10-Jun-2014
101-001 2 19-Jun-2014
101-001 2 4-Jul-2014
101-002 1 27-Sep-2016
101-002 1 1-Oct-2016
101-002 2 15-Nov-2016
101-002 2 21-Dec-2016
101-002 3 20-Nov-2016
101-002 3 1-Dec-2016
;
proc sql;
create table want as
select a.*
from have a, (select subject,regimen as _regimen from have group by subject having date=max(date)) b
where a.subject=b.subject
group by a.subject,regimen,_regimen
having date=min(date) and _regimen=regimen;
quit;
If you would rather not use SQL:
proc sort data=have; by subject regimen date; run;
proc means data=have noprint;
by subject;
output out=temp(drop=_: ) maxid(date(regimen))= / autoname;
run;
data want;
merge have temp(in=ok); by subject regimen;
if ok and first.regimen then output;
run;
If you only want the date (and not other variables from the first observation in the regimen), it can be done in a single datastep like this:
data want;
do until(last.subject);
set have;
by subject regimen;
if first.regimen then
first_date=date;
end;
date=first_date;
drop first_date;
run;
data have;
input Subject $ Regimen Date :date9.;
format date date9.;
cards;
101-001 1 12-Dec-2013
101-001 1 21-Dec-2013
101-001 1 5-Jan-2014
101-001 2 10-Jun-2014
101-001 2 19-Jun-2014
101-001 2 4-Jul-2014
101-002 1 27-Sep-2016
101-002 1 1-Oct-2016
101-002 2 15-Nov-2016
101-002 2 21-Dec-2016
101-002 3 20-Nov-2016
101-002 3 1-Dec-2016
;
data temp;
set have;
by Subject Regimen notsorted;
group+first.Regimen;
run;
proc sort data=temp;
by Subject group descending date;
run;
data want;
set temp;
by Subject;
if last.Subject;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.