Hi All,
I have two datasets A and B both are having more than 16 million records
I am trying to create a Flag based on the account_numbers e.g., if dataset A Account_number matches with Account_number from B dataset then the Flag should be 1 if not matches then Flag = 0
Matches Flag = 1
Mismatch Flag = 0
I want to see the percentage by Flag 1 & 0
for. eg., if i have total 10 records, out of which 8 matches in Dataset A and B then Flag 1 should be 80% and Flag 0 should be 20%
I am looking for outputs like Output 1 & Outpu 2
Dataset A
Account_number | Billing_Period |
12345678 | 201611 |
12345679 | 201611 |
12345680 | 201611 |
12345681 | 201611 |
12345682 | 201611 |
12345683 | 201611 |
12345684 | 201611 |
12345685 | 201611 |
12345686 | 201611 |
12345687 | 201611 |
Dataset B
Account_number | Billing_Period |
12345672 | 201611 |
12345679 | 201611 |
12345680 | 201611 |
12345681 | 201611 |
12345682 | 201611 |
12345683 | 201611 |
12345687 | 201611 |
12345685 | 201611 |
12345686 | 201611 |
12345687 | 201611 |
Expected Output 1:
Account_number | Account_number | Billing_Period | Flag |
12345678 | 12345672 | 201611 | 0 |
12345679 | 12345679 | 201611 | 1 |
12345680 | 12345680 | 201611 | 1 |
12345681 | 12345681 | 201611 | 1 |
12345682 | 12345682 | 201611 | 1 |
12345683 | 12345683 | 201611 | 1 |
12345684 | 12345688 | 201611 | 0 |
12345685 | 12345685 | 201611 | 1 |
12345686 | 12345686 | 201611 | 1 |
12345687 | 12345687 | 201611 | 1 |
Expected output 2:
Billing_Period | Flag | Percentage |
201611 | 1 | 80 |
201611 | 0 | 20 |
Thanks
nreddy
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
Here is the flag part. Do you want the second output as a SAS data set or a report?
data a;
input Account_number $ Billing_Period $;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data b;
input Account_number $ Billing_Period $;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
if _N_ = 1 then do;
dcl hash h(dataset : "b");
h.definekey("Account_number");
h.definedone();
end;
set a;
flag = (h.find() = 0);
run;
.
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.