Help using Base SAS procedures

remove duplicate records on the same day visit

Accepted Solution Solved
Reply
Super Contributor
Posts: 345
Accepted Solution

remove duplicate records on the same day visit


PROC SQL;
CREATE TABLE ta1 AS 
SELECT DISTINCT id, icd,   feq, date 

group by id , date,icd, 
order by id, date, icd;


QUIT;

Now if I want to remove some records if there are two records on same day, what should I do

I can not simply drop freq=2, because I do want to keep those have two visits on different days with same icd code

 

 


Accepted Solutions
Solution
‎11-21-2016 10:03 AM
Trusted Advisor
Posts: 1,022

Re: remove duplicate records on the same day visit

Assuming data HAVE is sorted by id/date, you can use the DIF function (where DIF(X) is defined as X-LAG(X)).

 

data want;
  set have;
  by id ;
  if first.id or dif(date)>7;
run;

 

 

Note: If FIRST.ID is true, it is logically superfluous to evaluate the DIF function, but SAS evaluates it anyhow-(which is good, because the DIF function needs to be evaluated for every record).  I did a test to demonstrate that it works as desired.  But if you're really nervous, the subsetting IF statement could be modified to

  if dif(date)>7  or first.id;

 

 

which would insure that DIF  is always evaluated (and the underlying lag queues are always updated.

 

Regards,

Mark

View solution in original post


All Replies
Super User
Posts: 5,441

Re: remove duplicate records on the same day visit

So what defines a duplicate? Id and date alone?

Removing duplicates usually makes the data less informative. Aggregation is preferred when possible (then you could have count column telling how many original records that contributed).

 

But, PROC SORT does this (remove duplicates using BY) that you is almost impossible to do in SQL.

Data never sleeps
Super Contributor
Posts: 345

Re: remove duplicate records on the same day visit

for this project, if there are two records with the same icd code, on the same day for the same patient, it might be a duplicate record that is not what I am interested in

But I want to include those records with two visits, freq=2, with different dates, with same id

sort is fine, then is there a function, to delete some records with same id, same icd, on the same day?

Trusted Advisor
Posts: 1,586

Re: remove duplicate records on the same day visit

Your code ask for   

DISTINCT id, icd,   feq, date

as much as I understand you will get duplicates only if ICD or FEQ differs between rows.

In such case, which one you want to delete ?

 

Run next code and check the output dataset (contains the duplicates) :

 

proc sort data=ta1; by ID DATE; run;

data dup;

  set ta1;

       by id  date;

           if not (first.date and last.date);

run;

 

 

Super Contributor
Posts: 345

Re: remove duplicate records on the same day visit

[ Edited ]

 I ran it, i still have same day records

 

please see table below, here only patient id 4, these two records are what i need, because these two visists are on two different dates, for id2  , because they have same icd code, same date, same patient id, i need to remove

 

by looking at the data, i notice, i also need to remove id 3, the billing date is 2 day difference, id5, one day difference, it might still be about the same visit, this is NOT i am interested in, so i need to remove those records with same id, same icd, same date or two billing date difference less than one week

 

thanks.

id

icd

freq

amount

billingdate

2

v234

2

78.91

1/1/2015

2

v234

2

96.05

1/1/2015

3

v44

2

43.74

2/3/2015

3

v44

2

22.04

2/5/2015

4

e34

2

52.26

3/6/2015

4

e34

2

11.27

9/5/2015

5

j23

2

379.08

4/1/2015

5

j23

2

217.12

4/2/2015

Trusted Advisor
Posts: 1,586

Re: remove duplicate records on the same day visit

Of course you get more than on row with same date.

I wanted to show you the dupliacte date records.

 

Look at ID=2. On 1st row amount=78.91 while on 2nd row amount=96.05.

 

Which row you prefer to delete ? or maybe you want to sum the amounts per ID ICD BILLINGDATE ? 

Super Contributor
Posts: 345

Re: remove duplicate records on the same day visit

Thank you and I am sorry I have not made it clear.

 

when the bill date difference is within 7 days, even it has three or records, I want to consider it as one visit only, there fore, I only want to keep the first day record, and delete all other billing records, which are after the first record, but not later than 7 days,

I need to create a variable, timelapase=time2-time1, and if timelapse > 0 but <=7, then all those later records will be deleted, only the first record will be kept

here, the record with 78.91  will be recorded

i sort it by id, date, and icd

then if first.id=1 and first.date=1 and first.icd=1, then timelaspse=0, time=date;

else timelapse=date(the next record)-time,, or date3(the records after that)=date-time

 

but I do not know how to do that

Trusted Advisor
Posts: 1,586

Re: remove duplicate records on the same day visit

Then try next code:

data want;
  set have;  
       by ID DATE TCD;   /* assumed data is sorted */
            retain prev_date; drop prev_date;
           
           if first.ID then do;
              prev_date = date;
              output;
           end; else
           if date - prev_date > 7 then do;
              prev_date = date;
              output;
           end; else delete;
run;

 

Super Contributor
Posts: 345

Re: remove duplicate records on the same day visit

Thanks. I did try your code, and I deleted all records after the first record for the same patient.

I do not know where the problem is and how to fix, I still want to keep later records, which are after 7 days

Trusted Advisor
Posts: 1,586

Re: remove duplicate records on the same day visit

[ Edited ]

@wenling - what code have you run ?

 

I have preceded a step to read your data and run next code:

data test;
  infile datalines dlm=',' truncover;
  input id icd $ amoumt date mmddyy10.;
  format date ddmmyy10.;
datalines;
2,v234,78.91,01/01/2015
2,v234,96.05,01/01/2015
3,v44,43.74,02/03/2015
3,v44,22.04,02/05/2015
4,e34,52.26,03/06/2015
4,e34,11.27,09/05/2015
5,j23,379.08,04/01/2015
5,j23,217.12,04/02/2015
; run;

proc sort data=test; by id icd date; run;

data want;
 set test;
  by id;
     retain prev_date; drop prev_date;
     if first.id then do;
        prev_date = date;
        output;
     end; else
     if date - prev_date > 7 then do;
        prev_date = date;
        output;
     end; else delete;
run;

then output contains next lines  (date format mmddyy10.):

2    v234   78.91  01/01/2015

3    v44     43.74  02/03/2015                        

4    e34     52.26  03/06/2015

4    e34     11.27  09/05/2015

5    j23    379.08  04/01/2015

 

 

 

Solution
‎11-21-2016 10:03 AM
Trusted Advisor
Posts: 1,022

Re: remove duplicate records on the same day visit

Assuming data HAVE is sorted by id/date, you can use the DIF function (where DIF(X) is defined as X-LAG(X)).

 

data want;
  set have;
  by id ;
  if first.id or dif(date)>7;
run;

 

 

Note: If FIRST.ID is true, it is logically superfluous to evaluate the DIF function, but SAS evaluates it anyhow-(which is good, because the DIF function needs to be evaluated for every record).  I did a test to demonstrate that it works as desired.  But if you're really nervous, the subsetting IF statement could be modified to

  if dif(date)>7  or first.id;

 

 

which would insure that DIF  is always evaluated (and the underlying lag queues are always updated.

 

Regards,

Mark

Trusted Advisor
Posts: 1,586

Re: remove duplicate records on the same day visit

If someone is visiting twice a week, then the 3rd or 4th visit will be after more than 7 days since first visit.

Would you like to delete it or save it ?

 

Using filter like  if first.id or dif(date) > 7 will delete the 3rd and the 4th visit,

 while 

data want;
  set have;  
       by ID DATE TCD;   /* assumed data is sorted */
            retain prev_date; drop prev_date;
           
           if first.ID then do;
              prev_date = date;
              output;
           end; else
           if date - prev_date > 7 then do;
              prev_date = date;
              output;
           end; else delete;
run;

will preserve, at least the 4th visit.

 

 

Trusted Advisor
Posts: 1,022

Re: remove duplicate records on the same day visit

And the avoidance of DIF in the alternative would also get the 7th, 10th, 13th dates if it was a multi-week series at two dates per week.  But it would not get visits 8,9,11,12.

 

In such a case a more likely objective would be to determine the first date (BEGDATE), last date (ENDDATE) and number of dates in each series - where a series is defined as a sequence of  dates with no internal  gaps over 7 days.  The example below produces such results using two DIF functions and a merge statement of a HAVE with itself (with the 2nd have using FIRSTOBS=2).  Note the merge does not have an associated BY statement.  If it did, the program would fail:

 

 

 

data want (drop=next_:);
  format begdate yymmddn8.;

  do ndates=1 by 1 until (dif(next_id)>0 or dif(next_date)>7);
    merge have (rename=(date=enddate))
          have (firstobs=2 keep=id date rename=(id=next_id date=next_date));
    if begdate=. then begdate=date;
  end;

run;

 

Of course,  in the original problem there are other variables that are being ignored in this new program.  But my response here is part of my campaign to make folks more informed, and more comfortable with the LAG and DIF functions.  I think they are often unnecessarily avoided.

 

regards,

Mark

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 12 replies
  • 451 views
  • 1 like
  • 4 in conversation