BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nolaness
Calcite | Level 5

Hey everybody! I am having trouble coming up with the language for a certain duplicate identification logic on multiple admissions.  I have a hospital discharge data set that contains a unique id, event information, and an admit date and discharge date.  We are considering any observations for the same individual where that individual has been readmitted in less than 48 hours to be a duplicate event/record.  So I think that what I need is to code language that will tell sas that when it finds matching unique IDs the program needs to calculate the difference between the discharge date of the original observation and the admit date of the second observation. Then if the difference between those two dates is less than 48hrs delete the second observation.  Any suggestions that you might have would be greatly appreciated! Thank you.

Cheers,

Vanessa

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Why would obs 5 be deleted?  The following deletes all of the obs you mentioned except for obs 5:

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;

  by medrec_no admit_dt;

run;

data want;

  set mock;

  by medrec_no;

  retain last_discharge_dt;

  if first.medrec_no then do;

    output;

    last_discharge_dt=discharge_dt;

  end;

  else do;

    if admit_dt gt last_discharge_dt+2 then do;

      last_discharge_dt=discharge_dt;

      output;

    end;

    else last_discharge_dt=discharge_dt;

  end;

run;

View solution in original post

7 REPLIES 7
art297
Opal | Level 21

Are they admission and discharge dates or datetimes?  What about a case where, in three days, a person was discharged on day1, readmitted and discharged on day2, and then readmitted on day3?  Some example data would help although the task seems easy enough to accomplish.

nolaness
Calcite | Level 5

My admit and discharge dates are currently dates, I have another data set (emergency department data) that I need to do the same duplicate deletion on where the they are datetimes but I was planning on creating a new variable with datepart so that the coding is the same for both data sets.  In your three day example both day 2 and day 3 would be considered the same event as the event that occurred on day 1.

Mock data:

obs     medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt

1       000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2       000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3       000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4       000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5       000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6       000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7       000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8       000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9       000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

......

So in the above example I would want to delete obs 2, 5, 7, & 8.

Basically the duration of hospitalization is unimportant for my analysis.  I am trying to eventually look at the count and rate of the event allowing some readmission (readmission after 2 days) to be considered as separate events.

Thanks!

art297
Opal | Level 21

Why would obs 5 be deleted?  The following deletes all of the obs you mentioned except for obs 5:

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;

  by medrec_no admit_dt;

run;

data want;

  set mock;

  by medrec_no;

  retain last_discharge_dt;

  if first.medrec_no then do;

    output;

    last_discharge_dt=discharge_dt;

  end;

  else do;

    if admit_dt gt last_discharge_dt+2 then do;

      last_discharge_dt=discharge_dt;

      output;

    end;

    else last_discharge_dt=discharge_dt;

  end;

run;

nolaness
Calcite | Level 5

Arthur and PG you are both correct. By my rules obs 5 would not be removed. Sorry for the typo! Also, thank you so much for the speedy responses. I will try these out when I get into the office tomorrow!

-Vanessa

PGStats
Opal | Level 21

I agree with Art, according to your rules, obs=5 should remain. The following code merges hospital bouts when they are separated by a discharge shorter than 3 days :

data Mock;
  informat medrec_no $9.;
  informat admit_dt discharge_dt mmddyy10.;
  format admit_dt discharge_dt mmddyy10.;
  informat first_nm last_nm $15.;
  input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
  datalines;
1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011
2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011
3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011
4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011
5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/06/2011          06/06/2011

9 000000003     Chris         Griffin             0     9923              06/09/2011          06/09/2011

10 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;


proc sort data=mock;
     by medrec_no admit_dt;
run;

data tmp(drop=prev_dt) / view=tmp;
do until(last.medrec_no);
     set mock; by medrec_no;
     if intnx("DAY",admit_dt,-2) > prev_dt then seq + 1;
     output;
     prev_dt = max(prev_dt, discharge_dt);
     end;
run;

proc sql;
create table want as
select min(obs) as obs,
     medrec_no, first_nm, last_nm, sex,
     min(event_code) as event_code,
     min(admit_dt) as admit_dt format=mmddyy10.,
     max(discharge_dt) as discharge_dt format=mmddyy10.,
     count(*) as nb_bouts
from tmp
group by medrec_no, first_nm, last_nm, sex, seq
order by obs;

drop view tmp;
quit;

PG

Message was edited by: PG - Added max for robustness.

PG
Haikuo
Onyx | Level 15

Am I missing something here? Why lag() is so ignored?

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;by medrec_no admit_dt;run;

data want;

  set mock;

    by medrec_no;

      dif=ifn(first.medrec_no,.,admit_dt-lag(discharge_dt));

      if .<dif<3 then delete;

      drop dif;

run;

proc print;run;

Haikuo

Ksharp
Super User
data Mock;
  informat medrec_no $9.;
  informat admit_dt discharge_dt mmddyy10.;
  format admit_dt discharge_dt mmddyy10.;
  informat first_nm last_nm $15.;
  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;
  cards;
1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011
2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011
3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011
4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011
5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011
6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011
7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011
8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011
9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011
;
run;
proc sort data=mock;by medrec_no admit_dt;run;
data want;
 set mock;
 if medrec_no eq lag(medrec_no) and (admit_dt - lag(discharge_dt)) lt 3 then delete;
run;

Ksharp

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1531 views
  • 3 likes
  • 5 in conversation