DATA Step, Macro, Functions and more

Eliminating observations with datasets in long form

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 82
Accepted Solution

Eliminating observations with datasets in long form

I am working with a medical claims dataset with variables ID (enrollee ID), claimID (a unique identifier for each claim), and the date of service. For this example, assume that claimID 1-5 are all claims for the same medical service, and I only want to count services *within an individual* that occurred during a 7 day period as one service (this is because there are sometimes duplicate claims with dates of service that are in close proximity).  

 

For this example, claimID 1 would be kept and claimID 2 would be deleted because it's within 6 days of claimID 1.  However, claimID 3 would be kept even though it's within 2 days of claimID 2 because claimID 2 will be deleted and claimID 3 is more than 7 days away from claimID 1.  ClaimID 4 and ClaimID 5 would be deleted because they are within 7 days of ClaimID 3, which will be kept.  

 

ID           ClaimID                Date               

1                     1                  1/1/2015         

1                     2                  1/7/2015         

1                     3                  1/9/2015         

1                     4                  1/10/2015       

1                     5                  1/16/2015       

 

I think the solution might be to create a loop that finds a claim that needs to be eliminated because it's within 7 days of the first claim; as soon as such a claim is detected, the loop exits and the claim is eliminated from the dataset.  On the next iteration of the loop, SAS starts over from the first claim and finds if there is another claim that needs to be eliminated because it's within 7 days; if not, SAS will keep the next the next claim that needs to be deduplicated, etc.  But I'm not sure exactly how to implement this.  I'd appreciate any advice.

 

 

 


Accepted Solutions
Solution
‎01-20-2016 11:34 AM
Super User
Posts: 19,768

Re: Eliminating observations with datasets in long form

Create a new variable that captures the 7 day interval start and compare the dates to the variable to determine which records are kept and which are dropped. Based on your example, the following should work. 

 

data have;
informat date mmddyy10.;
format date date9.;
input ID           ClaimID                Date ;
cards; 
1                     1                  1/1/2015         
1                     2                  1/7/2015         
1                     3                  1/9/2015         
1                     4                  1/10/2015       
1                     5                  1/16/2015 
;
run; 

data want;
set have;
by id;
retain cutoff7;

if first.id then cutoff7=date;
else if date-cutoff7<=7 then delete;
else cutoff7=date;
run;
	

View solution in original post


All Replies
Solution
‎01-20-2016 11:34 AM
Super User
Posts: 19,768

Re: Eliminating observations with datasets in long form

Create a new variable that captures the 7 day interval start and compare the dates to the variable to determine which records are kept and which are dropped. Based on your example, the following should work. 

 

data have;
informat date mmddyy10.;
format date date9.;
input ID           ClaimID                Date ;
cards; 
1                     1                  1/1/2015         
1                     2                  1/7/2015         
1                     3                  1/9/2015         
1                     4                  1/10/2015       
1                     5                  1/16/2015 
;
run; 

data want;
set have;
by id;
retain cutoff7;

if first.id then cutoff7=date;
else if date-cutoff7<=7 then delete;
else cutoff7=date;
run;
	
Frequent Contributor
Posts: 82

Re: Eliminating observations with datasets in long form

Thanks, Reeza - this worked.  Also, thanks to mohamed_zaki - your solution appears to have worked as well.

Super Contributor
Posts: 490

Re: Eliminating observations with datasets in long form

[ Edited ]
data have;
format date MMDDYY10.;
input ID ClaimID Date MMDDYY10.;
cards;
1 1 1/1/2015         
1 2 1/7/2015         
1 3 1/9/2015         
1 4 1/10/2015       
1 5 1/16/2015 
;
run;

data want;
set have;
by ID;
retain fdate ;
if first.ID then fdate=date; 
else if intck('day',fdate,date) <= 7 then delete;
else fdate=date;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 214 views
  • 0 likes
  • 3 in conversation