Hi All,
I have two datasets A and B both are having more than 16 million records
I am trying to create a Flag based on the account_numbers e.g., if dataset A Account_number matches with Account_number from B dataset then the Flag should be 1 if not matches then Flag = 0
Matches Flag = 1
Mismatch Flag = 0
I want to see the percentage by Flag 1 & 0
for. eg., if i have total 10 records, out of which 8 matches in Dataset A and B then Flag 1 should be 80% and Flag 0 should be 20%
I am looking for outputs like Output 1 & Outpu 2
Dataset A
| Account_number | Billing_Period |
| 12345678 | 201611 |
| 12345679 | 201611 |
| 12345680 | 201611 |
| 12345681 | 201611 |
| 12345682 | 201611 |
| 12345683 | 201611 |
| 12345684 | 201611 |
| 12345685 | 201611 |
| 12345686 | 201611 |
| 12345687 | 201611 |
Dataset B
| Account_number | Billing_Period |
| 12345672 | 201611 |
| 12345679 | 201611 |
| 12345680 | 201611 |
| 12345681 | 201611 |
| 12345682 | 201611 |
| 12345683 | 201611 |
| 12345687 | 201611 |
| 12345685 | 201611 |
| 12345686 | 201611 |
| 12345687 | 201611 |
Expected Output 1:
| Account_number | Account_number | Billing_Period | Flag |
| 12345678 | 12345672 | 201611 | 0 |
| 12345679 | 12345679 | 201611 | 1 |
| 12345680 | 12345680 | 201611 | 1 |
| 12345681 | 12345681 | 201611 | 1 |
| 12345682 | 12345682 | 201611 | 1 |
| 12345683 | 12345683 | 201611 | 1 |
| 12345684 | 12345688 | 201611 | 0 |
| 12345685 | 12345685 | 201611 | 1 |
| 12345686 | 12345686 | 201611 | 1 |
| 12345687 | 12345687 | 201611 | 1 |
Expected output 2:
| Billing_Period | Flag | Percentage |
| 201611 | 1 | 80 |
| 201611 | 0 | 20 |
Thanks
nreddy
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
Here is the flag part. Do you want the second output as a SAS data set or a report?
data a;
input Account_number $ Billing_Period $;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data b;
input Account_number $ Billing_Period $;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
if _N_ = 1 then do;
dcl hash h(dataset : "b");
h.definekey("Account_number");
h.definedone();
end;
set a;
flag = (h.find() = 0);
run;
.
data ds_a;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345678 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345684 201611
12345685 201611
12345686 201611
12345687 201611
;
data ds_b;
input account_number $ billing_period :yymmn6.;
format billing_period yymmn6.;
datalines;
12345672 201611
12345679 201611
12345680 201611
12345681 201611
12345682 201611
12345683 201611
12345687 201611
12345685 201611
12345686 201611
12345687 201611
;
data want;
set ds_a;
if _n_ = 1
then do;
declare hash b (dataset:"ds_b");
b.definekey("account_number","billing_period");
b.definedone();
end;
flag = (b.check() = 0);
run;
proc freq data=want;
tables billing_period*flag / out=want2;
run;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.