BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
twildone
Pyrite | Level 9

Hi....I have a dataset that contains a list of unique records for customers who are approved for specific products during a specific time period. The Approval dataset looks like this:

 

Customer_ID Product_ID Effective_Date Expiration_Date
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630

 

 

I want to be able to check another dataset (Claims_Data) that contains all the claims submitted for these same customers and obtain the following:

 

Customer_ID Product_ID Count_Approved Count_Not_Approved No_Claims
1232 1127654 3 1  
1232 1200782 4    
1345 1097862 5 2  
1345 1163452     1

 

Count_Approved is the number of claims where the claim_date from the Claims_Data dataset  is between the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

Count_Not_Approved is the number of claims submitted where the claim_date is outside the range for the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

As it is possible that customers may be approved for products and yet they may not have submitted any claims for the product. In this case, No_Claims=1 would be entered to keep track. Any suggestions. Thanks,

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

View solution in original post

2 REPLIES 2
LinusH
Tourmaline | Level 20
Perhaps you could get started with a SQL FULL JOIN with a BETWEEN - AND join criteria.
To get the aggregation use CASE within each SUM() to figure out if the joined result should contribute to respective measure.
Data never sleeps
Kurt_Bremser
Super User

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 766 views
  • 0 likes
  • 3 in conversation