Hi All,
I have two datasets A and B both are having more than 16 million records
I am trying to create a Flag based on the account_numbers e.g., if dataset A Account_number matches with Account_number from B dataset then the Flag should be 1 if not matches then Flag = 0
Matches Flag = 1
Mismatch Flag = 0
I want to see the percentage by Flag 1 & 0
for. eg., if i have total 10 records, out of which 8 matches in Dataset A and B then Flag 1 should be 80% and Flag 0 should be 20%
I am looking for outputs like Output 1 & Outpu 2
Dataset A
| Account_number | Billing_Period |
| 12345678 | 201611 |
| 12345679 | 201611 |
| 12345680 | 201611 |
| 12345681 | 201611 |
| 12345682 | 201611 |
| 12345683 | 201611 |
| 12345684 | 201611 |
| 12345685 | 201611 |
| 12345686 | 201611 |
| 12345687 | 201611 |
Dataset B
| Account_number | Billing_Period |
| 12345672 | 201611 |
| 12345679 | 201611 |
| 12345680 | 201611 |
| 12345681 | 201611 |
| 12345682 | 201611 |
| 12345683 | 201611 |
| 12345687 | 201611 |
| 12345685 | 201611 |
| 12345686 | 201611 |
| 12345687 | 201611 |
Expected Output 1:
| Account_number | Account_number | Billing_Period | Flag |
| 12345678 | 12345672 | 201611 | 0 |
| 12345679 | 12345679 | 201611 | 1 |
| 12345680 | 12345680 | 201611 | 1 |
| 12345681 | 12345681 | 201611 | 1 |
| 12345682 | 12345682 | 201611 | 1 |
| 12345683 | 12345683 | 201611 | 1 |
| 12345684 | 12345688 | 201611 | 0 |
| 12345685 | 12345685 | 201611 | 1 |
| 12345686 | 12345686 | 201611 | 1 |
| 12345687 | 12345687 | 201611 | 1 |
Expected output 2:
| Billing_Period | Flag | Percentage |
| 201611 | 1 | 80 |
| 201611 | 0 | 20 |
Thanks
nreddy
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
Here is the flag part. Do you want the second output as a SAS data set or a report?
data a;
input Account_number $ Billing_Period $;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data b;
input Account_number $ Billing_Period $;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
if _N_ = 1 then do;
dcl hash h(dataset : "b");
h.definekey("Account_number");
h.definedone();
end;
set a;
flag = (h.find() = 0);
run;
.
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.