DATA Step, Macro, Functions and more

Creating a loop for duplicate identification?

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Creating a loop for duplicate identification?

Hey everybody! I am having trouble coming up with the language for a certain duplicate identification logic on multiple admissions.  I have a hospital discharge data set that contains a unique id, event information, and an admit date and discharge date.  We are considering any observations for the same individual where that individual has been readmitted in less than 48 hours to be a duplicate event/record.  So I think that what I need is to code language that will tell sas that when it finds matching unique IDs the program needs to calculate the difference between the discharge date of the original observation and the admit date of the second observation. Then if the difference between those two dates is less than 48hrs delete the second observation.  Any suggestions that you might have would be greatly appreciated! Thank you.

Cheers,

Vanessa


Accepted Solutions
Solution
‎10-29-2012 07:01 PM
PROC Star
Posts: 7,363

Re: Creating a loop for duplicate identification?

Why would obs 5 be deleted?  The following deletes all of the obs you mentioned except for obs 5:

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;

  by medrec_no admit_dt;

run;

data want;

  set mock;

  by medrec_no;

  retain last_discharge_dt;

  if first.medrec_no then do;

    output;

    last_discharge_dt=discharge_dt;

  end;

  else do;

    if admit_dt gt last_discharge_dt+2 then do;

      last_discharge_dt=discharge_dt;

      output;

    end;

    else last_discharge_dt=discharge_dt;

  end;

run;

View solution in original post


All Replies
PROC Star
Posts: 7,363

Re: Creating a loop for duplicate identification?

Are they admission and discharge dates or datetimes?  What about a case where, in three days, a person was discharged on day1, readmitted and discharged on day2, and then readmitted on day3?  Some example data would help although the task seems easy enough to accomplish.

New Contributor
Posts: 4

Re: Creating a loop for duplicate identification?

My admit and discharge dates are currently dates, I have another data set (emergency department data) that I need to do the same duplicate deletion on where the they are datetimes but I was planning on creating a new variable with datepart so that the coding is the same for both data sets.  In your three day example both day 2 and day 3 would be considered the same event as the event that occurred on day 1.

Mock data:

obs     medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt

1       000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2       000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3       000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4       000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5       000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6       000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7       000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8       000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9       000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

......

So in the above example I would want to delete obs 2, 5, 7, & 8.

Basically the duration of hospitalization is unimportant for my analysis.  I am trying to eventually look at the count and rate of the event allowing some readmission (readmission after 2 days) to be considered as separate events.

Thanks!

Solution
‎10-29-2012 07:01 PM
PROC Star
Posts: 7,363

Re: Creating a loop for duplicate identification?

Why would obs 5 be deleted?  The following deletes all of the obs you mentioned except for obs 5:

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;

  by medrec_no admit_dt;

run;

data want;

  set mock;

  by medrec_no;

  retain last_discharge_dt;

  if first.medrec_no then do;

    output;

    last_discharge_dt=discharge_dt;

  end;

  else do;

    if admit_dt gt last_discharge_dt+2 then do;

      last_discharge_dt=discharge_dt;

      output;

    end;

    else last_discharge_dt=discharge_dt;

  end;

run;

New Contributor
Posts: 4

Re: Creating a loop for duplicate identification?

Arthur and PG you are both correct. By my rules obs 5 would not be removed. Sorry for the typo! Also, thank you so much for the speedy responses. I will try these out when I get into the office tomorrow!

-Vanessa

Respected Advisor
Posts: 4,651

Re: Creating a loop for duplicate identification?

I agree with Art, according to your rules, obs=5 should remain. The following code merges hospital bouts when they are separated by a discharge shorter than 3 days :

data Mock;
  informat medrec_no $9.;
  informat admit_dt discharge_dt mmddyy10.;
  format admit_dt discharge_dt mmddyy10.;
  informat first_nm last_nm $15.;
  input obs medrec_no first_nm last_nm sex event_code admit_dt discharge_dt;
  datalines;
1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011
2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011
3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011
4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011
5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/06/2011          06/06/2011

9 000000003     Chris         Griffin             0     9923              06/09/2011          06/09/2011

10 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;


proc sort data=mock;
     by medrec_no admit_dt;
run;

data tmp(drop=prev_dt) / view=tmp;
do until(last.medrec_no);
     set mock; by medrec_no;
     if intnx("DAY",admit_dt,-2) > prev_dt then seq + 1;
     output;
     prev_dt = max(prev_dt, discharge_dt);
     end;
run;

proc sql;
create table want as
select min(obs) as obs,
     medrec_no, first_nm, last_nm, sex,
     min(event_code) as event_code,
     min(admit_dt) as admit_dt format=mmddyy10.,
     max(discharge_dt) as discharge_dt format=mmddyy10.,
     count(*) as nb_bouts
from tmp
group by medrec_no, first_nm, last_nm, sex, seq
order by obs;

drop view tmp;
quit;

PG

Message was edited by: PG - Added max for robustness.

PG
Respected Advisor
Posts: 3,124

Re: Creating a loop for duplicate identification?

Am I missing something here? Why lag() is so ignored?

data Mock;

  informat medrec_no $9.;

  informat admit_dt discharge_dt mmddyy10.;

  format admit_dt discharge_dt mmddyy10.;

  informat first_nm last_nm $15.;

  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;

  cards;

1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011

2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011

3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011

4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011

5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011

6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011

7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011

8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011

9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011

;

proc sort data=mock;by medrec_no admit_dt;run;

data want;

  set mock;

    by medrec_no;

      dif=ifn(first.medrec_no,.,admit_dt-lag(discharge_dt));

      if .<dif<3 then delete;

      drop dif;

run;

proc print;run;

Haikuo

Super User
Posts: 9,682

Re: Creating a loop for duplicate identification?

data Mock;
  informat medrec_no $9.;
  informat admit_dt discharge_dt mmddyy10.;
  format admit_dt discharge_dt mmddyy10.;
  informat first_nm last_nm $15.;
  input obs medrec_no     first_nm     last_nm      sex     event_code     admit_dt          discharge_dt;
  cards;
1 000000001     John          Smith             0     9925              04/15/2011          04/16/2011
2 000000001     John          Smith             0     9925              04/16/2011          04/20/2011
3 000000001     John          Smith             0     9924              04/23/2011          04/24/2011
4 000000002     Jane          Wilson           1     9920              05/12/2011          05/13/2011
5 000000002     Jane          Wilson           1     9920              05/16/2011          05/16/2011
6 000000003     Chris         Griffin             0     9923              06/04/2011          06/04/2011
7 000000003     Chris         Griffin             0     9923              06/05/2011          06/08/2011
8 000000003     Chris         Griffin             0     9923              06/08/2011          06/09/2011
9 000000002     Jane          Wilson           1     9920              08/02/2011          08/03/2011
;
run;
proc sort data=mock;by medrec_no admit_dt;run;
data want;
 set mock;
 if medrec_no eq lag(medrec_no) and (admit_dt - lag(discharge_dt)) lt 3 then delete;
run;

Ksharp

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 347 views
  • 3 likes
  • 5 in conversation