DATA Step, Macro, Functions and more

re: data check

Accepted Solution Solved
Reply
Regular Contributor
Posts: 229
Accepted Solution

re: data check

Hi....I have a dataset that contains a list of unique records for customers who are approved for specific products during a specific time period. The Approval dataset looks like this:

 

Customer_ID Product_ID Effective_Date Expiration_Date
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630

 

 

I want to be able to check another dataset (Claims_Data) that contains all the claims submitted for these same customers and obtain the following:

 

Customer_ID Product_ID Count_Approved Count_Not_Approved No_Claims
1232 1127654 3 1  
1232 1200782 4    
1345 1097862 5 2  
1345 1163452     1

 

Count_Approved is the number of claims where the claim_date from the Claims_Data dataset  is between the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

Count_Not_Approved is the number of claims submitted where the claim_date is outside the range for the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

As it is possible that customers may be approved for products and yet they may not have submitted any claims for the product. In this case, No_Claims=1 would be entered to keep track. Any suggestions. Thanks,


Accepted Solutions
Solution
‎06-21-2016 06:53 PM
Super User
Posts: 7,758

Re: re: data check

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Super User
Posts: 5,424

Re: re: data check

Perhaps you could get started with a SQL FULL JOIN with a BETWEEN - AND join criteria.
To get the aggregation use CASE within each SUM() to figure out if the joined result should contribute to respective measure.
Data never sleeps
Solution
‎06-21-2016 06:53 PM
Super User
Posts: 7,758

Re: re: data check

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 206 views
  • 0 likes
  • 3 in conversation