Solved: re: data check

twildone · Posted 06-20-2016 09:13 PM

Hi....I have a dataset that contains a list of unique records for customers who are approved for specific products during a specific time period. The Approval dataset looks like this:

Customer_ID	Product_ID	Effective_Date	Expiration_Date
1232	1127654	20160101	20160430
1232	1200782	20160201	20160525
1345	1097862	20151223	20160325
1345	1097862	20160401	20160428
1345	1097862	20160515	20160615
1345	1163452	20160201	20160630

I want to be able to check another dataset (Claims_Data) that contains all the claims submitted for these same customers and obtain the following:

Customer_ID	Product_ID	Count_Approved	Count_Not_Approved	No_Claims
1232	1127654	3	1
1232	1200782	4
1345	1097862	5	2
1345	1163452			1

Count_Approved is the number of claims where the claim_date from the Claims_Data dataset is between the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

Count_Not_Approved is the number of claims submitted where the claim_date is outside the range for the Effective_Date and Expiration Date for the respective customer for the each Product_ID.

As it is possible that customers may be approved for products and yet they may not have submitted any claims for the product. In this case, No_Claims=1 would be entered to keep track. Any suggestions. Thanks,

Kurt_Bremser · Posted 06-21-2016 02:43 AM

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

LinusH · Posted 06-20-2016 10:23 PM

Perhaps you could get started with a SQL FULL JOIN with a BETWEEN - AND join criteria.
To get the aggregation use CASE within each SUM() to figure out if the joined result should contribute to respective measure.

Data never sleeps

Kurt_Bremser · Posted 06-21-2016 02:43 AM

Data step solution:

data approval;
input
  customer_id
  product_id
  effective_date:yymmdd8.
  expiration_date:yymmdd8.
;
format
  effective_date
  expiration_date
    yymmddn8.
;
cards;
1232 1127654 20160101 20160430
1232 1200782 20160201 20160525
1345 1097862 20151223 20160325
1345 1097862 20160401 20160428
1345 1097862 20160515 20160615
1345 1163452 20160201 20160630
;
run;

data claims_data;
input
  customer_id
  product_id
  claim_date:yymmdd8.
;
format claim_date yymmddn8.;
cards;
1232 1127654 20160320
1232 1127654 20160517
;
run;

data want (
  keep=customer_id product_id count_approved count_not_approved no_claims
);
merge
  approval (in=a)
  claims_data (in=b)
;
by customer_id product_id;
if a;
if not b
then do;
  count_approved = 0;
  count_not_approved = 0;
  no_claims = 1;
  output;
end;
else do;
  if first.product_id
  then do;
    count_approved = 0;
    count_not_approved = 0;
    no_claims = .;
  end;
  if effective_date <= claim_date <= expiration_date
  then count_approved + 1;
  else count_not_approved + 1;
  if last.product_id then output;
end;
run;

You can expand the test data in claims_data to verify more test cases, but I think it solves your requirements.

With your data, it might be necessary to sort the datasets first by customer_id product_id.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

re: data check

Re: re: data check

Re: re: data check

Re: re: data check

re: data check

Re: re: data check

Re: re: data check

Re: re: data check

Registration is open

SAS Training: Just a Click Away