BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I have a dataset that contains a list of unique records for customers who are approved for specific products during a specific time period. The Approval dataset looks like this:

 

Customer_ID Product_ID Effective_Date Expiration_Date
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630

 

 

I want to be able to check another dataset (Claims_Data) that contains all the claims submitted for these same customers and obtain the following:

 

Customer_ID Product_ID Count_Approved Count_Not_Approved No_Claims
1232 1127654 3 1  
1232 1200782 4    
1345 1097862 5 2  
1345 1163452     1

 

Count_Approved is the number of claims where the claim_date from the Claims_Data dataset  is between the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

Count_Not_Approved is the number of claims submitted where the claim_date is outside the range for the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

As it is possible that customers may be approved for products and yet they may not have submitted any claims for the product. In this case, No_Claims=1 would be entered to keep track. Any suggestions. Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

View solution in original post

2 REPLIES 2
LinusH
Tourmaline | Level 20
Perhaps you could get started with a SQL FULL JOIN with a BETWEEN - AND join criteria.
To get the aggregation use CASE within each SUM() to figure out if the joined result should contribute to respective measure.
Data never sleeps
Kurt_Bremser
Super User

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1257 views
  • 0 likes
  • 3 in conversation