Hi all,
I have two datasets. Test1 contains all the account information in 131 repcodes. Test2 Datasets have all the linked accounts from 131 repcodes. So I want to find if the name in linked accounts (Test2) matches with Test1 data then output result should show 'Yes' else 'No'. Can you please suggest the best way to do it? I tried the case when statement but then it only gives me Yes/No information from Test1(only the accounts in Test1). I would like the output data to show all the accountnumbers (Test1 and Test2) and if the name and surname matches with the accountnumber in 131 repcode then Yes, if it doesn't then No.
Here are the two datasets:
Data Test_1;
infile cards expandtabs;
input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462 47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;
run;
Data Test_2;
infile cards expandtabs;
input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646 24648465 139 Tom Thomas
452646 24645647 241 Tom Thomas
523482 24647864 135 Diva Marl
564612 35464841 294 Alis Cook
646823 34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462 47561364 168 Amy Jackson
324648 24679597 505 Dave Harrison
;
run;
Thanks
What is it you want to match on? Is it the ICUSTOMER variable?
data Test_1;
input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462 47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;
data Test_2;
input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646 24648465 139 Tom Thomas
452646 24645647 241 Tom Thomas
523482 24647864 135 Diva Marl
564612 35464841 294 Alis Cook
646823 34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462 47561364 168 Amy Jackson
324648 24679597 505 Dave Harrison
;
proc sql;
create table want as
select coalesce(a.icustomer,b.icustomer) as icustomer
, a.accountnumber
, a.rep_code as rep_code1
, a.name as first_name1
, a.surname as surname as last_name1
, b.debt_code
, b.rep_code as rep_code2
, b.dr_inits as first_name2
, b.name as last_name2
, case when (a.surname=b.name and a.name=b.dr_inits) then 'YES' else 'NO' end as Match
from test_1 a
full join test_2 b
on a.icustomer=b.icustomer
and a.surname=b.name
and a.name=b.dr_inits
order by 1,2
;
quit;
proc print;
run;
rep_ first_ last_ debt_ rep_ first_ last_ Obs icustomer accountnumber code1 name1 name1 code code2 name2 name2 Match 1 25462 47561365 131 Amy Jackson 47561364 168 Amy Jackson YES 2 324648 24679591 131 Dave Harrison 24679597 505 Dave Harrison YES 3 452646 24648463 131 Tom Thomas 24648465 139 Tom Thomas YES 4 452646 24648463 131 Tom Thomas 24645647 241 Tom Thomas YES 5 523482 24647866 131 Diva Marl 24647864 135 Diva Marl YES 6 564612 . . 35464841 294 Alis Cook NO 7 646823 . . 34625461 151 Joe Root NO 8 2468764 . . 26574646 502 Martin Luthis NO
proc sql;
create table want as select
coalesce(a.icustomer,b.icustomer) as icustomer
,a.accountnumber
,b.debt_code
,coalesce(a.rep_code,b.rep_code) as rep_code
,coalesce(a.name,b.dr_inits) as name
,coalesce(a.surname,b.name) as surname
,case when a.name=b.dr_inits and a.surname=b.name then 'Yes' else 'No' end as compare
from test_1 as a full join test_2 as b
on a.name=b.dr_inits and a.surname=b.name;
quit;
What is it you want to match on? Is it the ICUSTOMER variable?
data Test_1;
input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462 47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;
data Test_2;
input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646 24648465 139 Tom Thomas
452646 24645647 241 Tom Thomas
523482 24647864 135 Diva Marl
564612 35464841 294 Alis Cook
646823 34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462 47561364 168 Amy Jackson
324648 24679597 505 Dave Harrison
;
proc sql;
create table want as
select coalesce(a.icustomer,b.icustomer) as icustomer
, a.accountnumber
, a.rep_code as rep_code1
, a.name as first_name1
, a.surname as surname as last_name1
, b.debt_code
, b.rep_code as rep_code2
, b.dr_inits as first_name2
, b.name as last_name2
, case when (a.surname=b.name and a.name=b.dr_inits) then 'YES' else 'NO' end as Match
from test_1 a
full join test_2 b
on a.icustomer=b.icustomer
and a.surname=b.name
and a.name=b.dr_inits
order by 1,2
;
quit;
proc print;
run;
rep_ first_ last_ debt_ rep_ first_ last_ Obs icustomer accountnumber code1 name1 name1 code code2 name2 name2 Match 1 25462 47561365 131 Amy Jackson 47561364 168 Amy Jackson YES 2 324648 24679591 131 Dave Harrison 24679597 505 Dave Harrison YES 3 452646 24648463 131 Tom Thomas 24648465 139 Tom Thomas YES 4 452646 24648463 131 Tom Thomas 24645647 241 Tom Thomas YES 5 523482 24647866 131 Diva Marl 24647864 135 Diva Marl YES 6 564612 . . 35464841 294 Alis Cook NO 7 646823 . . 34625461 151 Joe Root NO 8 2468764 . . 26574646 502 Martin Luthis NO
That is the nature of an SQL join. It 2 observations from one dataset match 4 observations form the other the other you get 2*4 = 8 observations, not 2+4 or max(2,4) observations you might get with a SET or MERGE statement.
For example for this particular query if the same ICUSTOMER has two different names then you will get two different output observations.
If the issue is that observations from TEST_1 have the same other values except for ACCOUNTNUMBER and you just want to take the first ACCOUNTNUMBER then I would do that before trying to combine the datasets.
That is not something that SQL is very good at. It would be better to do that with a data step.
Thank you!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.