Hey everybody! I am having trouble coming up with the language for a certain duplicate identification logic on multiple admissions. I have a hospital discharge data set that contains a unique id, event information, and an admit date and discharge date. We are considering any observations for the same individual where that individual has been readmitted in less than 48 hours to be a duplicate event/record. So I think that what I need is to code language that will tell sas that when it finds matching unique IDs the program needs to calculate the difference between the discharge date of the original observation and the admit date of the second observation. Then if the difference between those two dates is less than 48hrs delete the second observation. Any suggestions that you might have would be greatly appreciated! Thank you.
Cheers,
Vanessa
Why would obs 5 be deleted? The following deletes all of the obs you mentioned except for obs 5:
data Mock;
informat medrec_no $9.;
informat admit_dt discharge_dt mmddyy10.;
format admit_dt discharge_dt mmddyy10.;
informat first_nm last_nm $15.;
input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
cards;
1 000000001 John Smith 0 9925 04/15/2011 04/16/2011
2 000000001 John Smith 0 9925 04/16/2011 04/20/2011
3 000000001 John Smith 0 9924 04/23/2011 04/24/2011
4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011
5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011
6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011
7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011
8 000000003 Chris Griffin 0 9923 06/08/2011 06/09/2011
9 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011
;
proc sort data=mock;
by medrec_no admit_dt;
run;
data want;
set mock;
by medrec_no;
retain last_discharge_dt;
if first.medrec_no then do;
output;
last_discharge_dt=discharge_dt;
end;
else do;
if admit_dt gt last_discharge_dt+2 then do;
last_discharge_dt=discharge_dt;
output;
end;
else last_discharge_dt=discharge_dt;
end;
run;
Are they admission and discharge dates or datetimes? What about a case where, in three days, a person was discharged on day1, readmitted and discharged on day2, and then readmitted on day3? Some example data would help although the task seems easy enough to accomplish.
My admit and discharge dates are currently dates, I have another data set (emergency department data) that I need to do the same duplicate deletion on where the they are datetimes but I was planning on creating a new variable with datepart so that the coding is the same for both data sets. In your three day example both day 2 and day 3 would be considered the same event as the event that occurred on day 1.
Mock data:
obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt
1 000000001 John Smith 0 9925 04/15/2011 04/16/2011
2 000000001 John Smith 0 9925 04/16/2011 04/20/2011
3 000000001 John Smith 0 9924 04/23/2011 04/24/2011
4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011
5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011
6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011
7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011
8 000000003 Chris Griffin 0 9923 06/08/2011 06/09/2011
9 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011
......
So in the above example I would want to delete obs 2, 5, 7, & 8.
Basically the duration of hospitalization is unimportant for my analysis. I am trying to eventually look at the count and rate of the event allowing some readmission (readmission after 2 days) to be considered as separate events.
Thanks!
Why would obs 5 be deleted? The following deletes all of the obs you mentioned except for obs 5:
data Mock;
informat medrec_no $9.;
informat admit_dt discharge_dt mmddyy10.;
format admit_dt discharge_dt mmddyy10.;
informat first_nm last_nm $15.;
input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
cards;
1 000000001 John Smith 0 9925 04/15/2011 04/16/2011
2 000000001 John Smith 0 9925 04/16/2011 04/20/2011
3 000000001 John Smith 0 9924 04/23/2011 04/24/2011
4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011
5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011
6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011
7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011
8 000000003 Chris Griffin 0 9923 06/08/2011 06/09/2011
9 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011
;
proc sort data=mock;
by medrec_no admit_dt;
run;
data want;
set mock;
by medrec_no;
retain last_discharge_dt;
if first.medrec_no then do;
output;
last_discharge_dt=discharge_dt;
end;
else do;
if admit_dt gt last_discharge_dt+2 then do;
last_discharge_dt=discharge_dt;
output;
end;
else last_discharge_dt=discharge_dt;
end;
run;
Arthur and PG you are both correct. By my rules obs 5 would not be removed. Sorry for the typo! Also, thank you so much for the speedy responses. I will try these out when I get into the office tomorrow!
-Vanessa
I agree with Art, according to your rules, obs=5 should remain. The following code merges hospital bouts when they are separated by a discharge shorter than 3 days :
data Mock;
informat medrec_no $9.;
informat admit_dt discharge_dt mmddyy10.;
format admit_dt discharge_dt mmddyy10.;
informat first_nm last_nm $15.;
input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
datalines;
1 000000001 John Smith 0 9925 04/15/2011 04/16/2011
2 000000001 John Smith 0 9925 04/16/2011 04/20/2011
3 000000001 John Smith 0 9924 04/23/2011 04/24/2011
4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011
5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011
6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011
7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011
8 000000003 Chris Griffin 0 9923 06/06/2011 06/06/2011
9 000000003 Chris Griffin 0 9923 06/09/2011 06/09/2011
10 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011
;
proc sort data=mock;
by medrec_no admit_dt;
run;
data tmp(drop=prev_dt) / view=tmp;
do until(last.medrec_no);
set mock; by medrec_no;
if intnx("DAY",admit_dt,-2) > prev_dt then seq + 1;
output;
prev_dt = max(prev_dt, discharge_dt);
end;
run;
proc sql;
create table want as
select min(obs) as obs,
medrec_no, first_nm, last_nm, sex,
min(event_code) as event_code,
min(admit_dt) as admit_dt format=mmddyy10.,
max(discharge_dt) as discharge_dt format=mmddyy10.,
count(*) as nb_bouts
from tmp
group by medrec_no, first_nm, last_nm, sex, seq
order by obs;
drop view tmp;
quit;
PG
Message was edited by: PG - Added max for robustness.
Am I missing something here? Why lag() is so ignored?
data Mock;
informat medrec_no $9.;
informat admit_dt discharge_dt mmddyy10.;
format admit_dt discharge_dt mmddyy10.;
informat first_nm last_nm $15.;
input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
cards;
1 000000001 John Smith 0 9925 04/15/2011 04/16/2011
2 000000001 John Smith 0 9925 04/16/2011 04/20/2011
3 000000001 John Smith 0 9924 04/23/2011 04/24/2011
4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011
5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011
6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011
7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011
8 000000003 Chris Griffin 0 9923 06/08/2011 06/09/2011
9 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011
;
proc sort data=mock;by medrec_no admit_dt;run;
data want;
set mock;
by medrec_no;
dif=ifn(first.medrec_no,.,admit_dt-lag(discharge_dt));
if .<dif<3 then delete;
drop dif;
run;
proc print;run;
Haikuo
data Mock; informat medrec_no $9.; informat admit_dt discharge_dt mmddyy10.; format admit_dt discharge_dt mmddyy10.; informat first_nm last_nm $15.; input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt; cards; 1 000000001 John Smith 0 9925 04/15/2011 04/16/2011 2 000000001 John Smith 0 9925 04/16/2011 04/20/2011 3 000000001 John Smith 0 9924 04/23/2011 04/24/2011 4 000000002 Jane Wilson 1 9920 05/12/2011 05/13/2011 5 000000002 Jane Wilson 1 9920 05/16/2011 05/16/2011 6 000000003 Chris Griffin 0 9923 06/04/2011 06/04/2011 7 000000003 Chris Griffin 0 9923 06/05/2011 06/08/2011 8 000000003 Chris Griffin 0 9923 06/08/2011 06/09/2011 9 000000002 Jane Wilson 1 9920 08/02/2011 08/03/2011 ; run; proc sort data=mock;by medrec_no admit_dt;run; data want; set mock; if medrec_no eq lag(medrec_no) and (admit_dt - lag(discharge_dt)) lt 3 then delete; run;
Ksharp
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.