Hello,
I'm a novice SAS user (SAS 9.4) and I'm working with a healthcare dataset where I need to count the number of observations that were discharged and admitted on the same day at different places of service. Example, if ID 1 was seen at the ED on 01/01/2020 and was discharged on 01/02/2020 and Inpatient admission on 01/02/2020 then the new variable would count that as 1 service but if ID 1 was seen at the ED on 01/10/2020 and discharged on 01/10/2020 and Inpatient was admitted on 01/31/2020 then the new variable wouldn't count that. How would I create a variable that would count that?
data SC.test;
input id $ admitdate:mmddyy10. dischargedate:mmddyy10. place $2. ;
format ADMITdate MMDDYY10. DISCHARGEdate MMDDYY10.;
datalines;
1 09/10/2020 09/10/2020 ED
1 09/10/2020 09/11/2020 IN
2 10/10/2020 10/12/2020 IN
3 04/03/2020 04/03/2020 ED
3 05/03/2020 05/03/2020 ED
3 05/29/2020 06/03/2020 IN
4 10/09/2020 10/10/2020 ED
4 10/10/2020 10/12/2020 IN
4 11/03/2020 11/03/2020 ED
4 12/29/2020 12/31/2020 IN
;
run;
My expected output would be something like this:
ID | ADMITDATE | DISCHARGEDATE | PLACE | NEWVAR |
1 | 9/10/2020 | 9/10/2020 | ED | 1 |
1 | 9/10/2020 | 9/11/2020 | IN | 1 |
2 | 10/10/2020 | 10/12/2020 | IN | . |
3 | 4/3/2020 | 4/3/2020 | ED | . |
3 | 5/3/2020 | 5/3/2020 | ED | . |
3 | 5/29/2020 | 6/3/2020 | IN | . |
4 | 10/9/2020 | 10/10/2020 | ED | 1 |
4 | 10/10/2020 | 10/12/2020 | IN | 1 |
4 | 11/3/2020 | 11/3/2020 | ED | . |
4 | 12/29/2020 | 12/31/2020 | IN | . |
I would really appreciate any insight. Thank you!
Amanda
You want to set a dummy variable to one for all records for which a discharge from one place occured on the same date as a admission to another place. You want to set the dummy to 1 for both records.
If so, then:
data test;
input id $ admitdate:mmddyy10. dischargedate:mmddyy10. place $2. ;
format ADMITdate MMDDYY10. DISCHARGEdate MMDDYY10.;
datalines;
1 09/10/2020 09/10/2020 ED
1 09/10/2020 09/11/2020 IN
2 10/10/2020 10/12/2020 IN
3 04/03/2020 04/03/2020 ED
3 05/03/2020 05/03/2020 ED
3 05/29/2020 06/03/2020 IN
4 10/09/2020 10/10/2020 ED
4 10/10/2020 10/12/2020 IN
4 11/03/2020 11/03/2020 ED
4 12/29/2020 12/31/2020 IN
run;
data want (drop=nxt_:);
set test (keep=id admitdate dischargedate);
by id admitdate dischargedate;
merge test
test (firstobs=2 keep=admitdate rename=(admitdate=nxt_admdate));
if (last.id=0 and nxt_admdate=dischargedate) or (first.id=0 and admitdate=lag(dischargedate)) then dummy=1;
else dummy=0;
run;
This code assumes the data are sorted by ID/ADMITDATE ID/ADMITDATE/DISCHARGEDATE The code logic really only needs a BY statement of
BY ID;
But I also included ADMITDATE ADMITDATE and DISCHARGEDATE, so if the data set is not properly sorted, the program will stop with an error message.
ERROR: BY variables are not properly sorted on data set WORK.TEST.
and you'll know you need to sort the data.
When you say "count" you have to be pretty explicit about why None of these get a count:
2 | 10/10/2020 | 10/12/2020 | IN | . |
3 | 4/3/2020 | 4/3/2020 | ED | . |
3 | 5/3/2020 | 5/3/2020 | ED | . |
3 | 5/29/2020 | 6/3/2020 | IN |
. |
4 | 11/3/2020 | 11/3/2020 | ED | . |
4 | 12/29/2020 | 12/31/2020 | IN | . |
but this does:
1 | 9/10/2020 | 9/10/2020 | ED | 1 |
I am not seeing any Rule that says id 2 and 3 or part of id=4 should not have anything counted.
My observation of just the Want example would be that the DATES really are not playing an actual role in the result at all.
You want to set a dummy variable to one for all records for which a discharge from one place occured on the same date as a admission to another place. You want to set the dummy to 1 for both records.
If so, then:
data test;
input id $ admitdate:mmddyy10. dischargedate:mmddyy10. place $2. ;
format ADMITdate MMDDYY10. DISCHARGEdate MMDDYY10.;
datalines;
1 09/10/2020 09/10/2020 ED
1 09/10/2020 09/11/2020 IN
2 10/10/2020 10/12/2020 IN
3 04/03/2020 04/03/2020 ED
3 05/03/2020 05/03/2020 ED
3 05/29/2020 06/03/2020 IN
4 10/09/2020 10/10/2020 ED
4 10/10/2020 10/12/2020 IN
4 11/03/2020 11/03/2020 ED
4 12/29/2020 12/31/2020 IN
run;
data want (drop=nxt_:);
set test (keep=id admitdate dischargedate);
by id admitdate dischargedate;
merge test
test (firstobs=2 keep=admitdate rename=(admitdate=nxt_admdate));
if (last.id=0 and nxt_admdate=dischargedate) or (first.id=0 and admitdate=lag(dischargedate)) then dummy=1;
else dummy=0;
run;
This code assumes the data are sorted by ID/ADMITDATE ID/ADMITDATE/DISCHARGEDATE The code logic really only needs a BY statement of
BY ID;
But I also included ADMITDATE ADMITDATE and DISCHARGEDATE, so if the data set is not properly sorted, the program will stop with an error message.
ERROR: BY variables are not properly sorted on data set WORK.TEST.
and you'll know you need to sort the data.
The real key to this solution is in the MERGE statement, which does a self-merge of TEST, simultaneously taking data from both observation i and observation i+1 (where TEST has the FIRSTOBS=2 parameter). This allows you to look-ahead and compare data in the current obs to the next obs. Of course, that means you have to rename variables because both obs start out with the same variable names in a self-merge.
The SET statement is there merely to use with the BY statement, thereby allowing use of FIRST.ID and LAST.ID in the IF statement. You could actually drop the SET and BY statements, through use of a slightly "busier" if statement, as in
data want2 (drop=nxt_:);
merge test
test (firstobs=2 keep=id admitdate rename=(id=nxt_id admitdate=nxt_admdate));
if (id=nxt_id and nxt_admdate=dischargedate) or (id=lag(id) and admitdate=lag(dischargedate)) then dummy=1;
else dummy=0;
RUN;
Of course, this latter code version doesn't protect you against data that might not be sorted BY ID/ADMITDATE/DISCHARGEDATE.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.