I am working with a medical claims dataset with variables ID (enrollee ID), claimID (a unique identifier for each claim), and the date of service. For this example, assume that claimID 1-5 are all claims for the same medical service, and I only want to count services *within an individual* that occurred during a 7 day period as one service (this is because there are sometimes duplicate claims with dates of service that are in close proximity).
For this example, claimID 1 would be kept and claimID 2 would be deleted because it's within 6 days of claimID 1. However, claimID 3 would be kept even though it's within 2 days of claimID 2 because claimID 2 will be deleted and claimID 3 is more than 7 days away from claimID 1. ClaimID 4 and ClaimID 5 would be deleted because they are within 7 days of ClaimID 3, which will be kept.
ID ClaimID Date
1 1 1/1/2015
1 2 1/7/2015
1 3 1/9/2015
1 4 1/10/2015
1 5 1/16/2015
I think the solution might be to create a loop that finds a claim that needs to be eliminated because it's within 7 days of the first claim; as soon as such a claim is detected, the loop exits and the claim is eliminated from the dataset. On the next iteration of the loop, SAS starts over from the first claim and finds if there is another claim that needs to be eliminated because it's within 7 days; if not, SAS will keep the next the next claim that needs to be deduplicated, etc. But I'm not sure exactly how to implement this. I'd appreciate any advice.
Create a new variable that captures the 7 day interval start and compare the dates to the variable to determine which records are kept and which are dropped. Based on your example, the following should work.
data have;
informat date mmddyy10.;
format date date9.;
input ID ClaimID Date ;
cards;
1 1 1/1/2015
1 2 1/7/2015
1 3 1/9/2015
1 4 1/10/2015
1 5 1/16/2015
;
run;
data want;
set have;
by id;
retain cutoff7;
if first.id then cutoff7=date;
else if date-cutoff7<=7 then delete;
else cutoff7=date;
run;
Create a new variable that captures the 7 day interval start and compare the dates to the variable to determine which records are kept and which are dropped. Based on your example, the following should work.
data have;
informat date mmddyy10.;
format date date9.;
input ID ClaimID Date ;
cards;
1 1 1/1/2015
1 2 1/7/2015
1 3 1/9/2015
1 4 1/10/2015
1 5 1/16/2015
;
run;
data want;
set have;
by id;
retain cutoff7;
if first.id then cutoff7=date;
else if date-cutoff7<=7 then delete;
else cutoff7=date;
run;
Thanks, Reeza - this worked. Also, thanks to mohamed_zaki - your solution appears to have worked as well.
data have;
format date MMDDYY10.;
input ID ClaimID Date MMDDYY10.;
cards;
1 1 1/1/2015
1 2 1/7/2015
1 3 1/9/2015
1 4 1/10/2015
1 5 1/16/2015
;
run;
data want;
set have;
by ID;
retain fdate ;
if first.ID then fdate=date;
else if intck('day',fdate,date) <= 7 then delete;
else fdate=date;
run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.