Solved: Finding common value from two datasets

Sandeep77 · Posted 06-05-2023 09:35 AM

Hi all,

I have two datasets. Test1 contains all the account information in 131 repcodes. Test2 Datasets have all the linked accounts from 131 repcodes. So I want to find if the name in linked accounts (Test2) matches with Test1 data then output result should show 'Yes' else 'No'. Can you please suggest the best way to do it? I tried the case when statement but then it only gives me Yes/No information from Test1(only the accounts in Test1). I would like the output data to show all the accountnumbers (Test1 and Test2) and if the name and surname matches with the accountnumber in 131 repcode then Yes, if it doesn't then No.

Here are the two datasets:

Data Test_1;
infile cards expandtabs;
input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462 47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;
run;

Data Test_2;
infile cards expandtabs;
input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646 24648465 139 Tom Thomas
452646 24645647 241 Tom Thomas
523482 24647864 135 Diva Marl
564612 35464841 294 Alis Cook
646823 34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462 47561364 168 Amy Jackson
324648 24679597 505 Dave Harrison
;
run;

Thanks

Tom · Posted 06-05-2023 09:48 AM

What is it you want to match on? Is it the ICUSTOMER variable?

data Test_1;
  input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462  47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;

data Test_2;
  input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646  24648465 139 Tom Thomas
452646  24645647 241 Tom Thomas
523482  24647864 135 Diva Marl
564612  35464841 294 Alis Cook
646823  34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462   47561364 168 Amy Jackson
324648  24679597 505 Dave Harrison
;

proc sql;
create table want as 
select coalesce(a.icustomer,b.icustomer) as icustomer
     , a.accountnumber
     , a.rep_code as rep_code1
     , a.name as first_name1
     , a.surname as surname as last_name1
     , b.debt_code
     , b.rep_code as rep_code2
     , b.dr_inits as first_name2
     , b.name as last_name2
     , case when (a.surname=b.name and a.name=b.dr_inits) then 'YES' else 'NO' end as Match
from test_1 a
full join test_2 b
   on a.icustomer=b.icustomer
  and a.surname=b.name
  and a.name=b.dr_inits
order by 1,2
;

quit;

proc print;
run;

                                      rep_    first_    last_         debt_      rep_    first_    last_
Obs    icustomer    accountnumber    code1    name1     name1         code      code2    name2     name2       Match

 1        25462        47561365       131      Amy      Jackson     47561364     168     Amy       Jackson      YES
 2       324648        24679591       131      Dave     Harrison    24679597     505     Dave      Harrison     YES
 3       452646        24648463       131      Tom      Thomas      24648465     139     Tom       Thomas       YES
 4       452646        24648463       131      Tom      Thomas      24645647     241     Tom       Thomas       YES
 5       523482        24647866       131      Diva     Marl        24647864     135     Diva      Marl         YES
 6       564612               .         .                           35464841     294     Alis      Cook         NO
 7       646823               .         .                           34625461     151     Joe       Root         NO
 8      2468764               .         .                           26574646     502     Martin    Luthis       NO

View solution in original post

PaigeMiller · Posted 06-05-2023 09:39 AM

proc sql;
    create table want as select 
        coalesce(a.icustomer,b.icustomer) as icustomer
        ,a.accountnumber
        ,b.debt_code
        ,coalesce(a.rep_code,b.rep_code) as rep_code
        ,coalesce(a.name,b.dr_inits) as name
        ,coalesce(a.surname,b.name) as surname
        ,case when a.name=b.dr_inits and a.surname=b.name then 'Yes' else 'No' end as compare
        from test_1 as a full join test_2 as b
        on a.name=b.dr_inits and a.surname=b.name;
quit;

--
Paige Miller

Tom · Posted 06-05-2023 09:48 AM

What is it you want to match on? Is it the ICUSTOMER variable?

data Test_1;
  input icustomer accountnumber rep_code Name $ Surname $;
datalines ;
452646 24648463 131 Tom Thomas
523482 24647866 131 Diva Marl
25462  47561365 131 Amy Jackson
324648 24679591 131 Dave Harrison
;

data Test_2;
  input icustomer debt_code rep_code Dr_inits $ Name $;
datalines ;
452646  24648465 139 Tom Thomas
452646  24645647 241 Tom Thomas
523482  24647864 135 Diva Marl
564612  35464841 294 Alis Cook
646823  34625461 151 Joe Root
2468764 26574646 502 Martin Luthis
25462   47561364 168 Amy Jackson
324648  24679597 505 Dave Harrison
;

proc sql;
create table want as 
select coalesce(a.icustomer,b.icustomer) as icustomer
     , a.accountnumber
     , a.rep_code as rep_code1
     , a.name as first_name1
     , a.surname as surname as last_name1
     , b.debt_code
     , b.rep_code as rep_code2
     , b.dr_inits as first_name2
     , b.name as last_name2
     , case when (a.surname=b.name and a.name=b.dr_inits) then 'YES' else 'NO' end as Match
from test_1 a
full join test_2 b
   on a.icustomer=b.icustomer
  and a.surname=b.name
  and a.name=b.dr_inits
order by 1,2
;

quit;

proc print;
run;

                                      rep_    first_    last_         debt_      rep_    first_    last_
Obs    icustomer    accountnumber    code1    name1     name1         code      code2    name2     name2       Match

 1        25462        47561365       131      Amy      Jackson     47561364     168     Amy       Jackson      YES
 2       324648        24679591       131      Dave     Harrison    24679597     505     Dave      Harrison     YES
 3       452646        24648463       131      Tom      Thomas      24648465     139     Tom       Thomas       YES
 4       452646        24648463       131      Tom      Thomas      24645647     241     Tom       Thomas       YES
 5       523482        24647866       131      Diva     Marl        24647864     135     Diva      Marl         YES
 6       564612               .         .                           35464841     294     Alis      Cook         NO
 7       646823               .         .                           34625461     151     Joe       Root         NO
 8      2468764               .         .                           26574646     502     Martin    Luthis       NO

Sandeep77 · Posted 06-05-2023 10:14 AM

Thank you @Tom. It worked but I don't know why it is showing more records. I have total 76k accounts but when I ran this code, it shows around 100K records. Do you know what could be the reason for this?

Tom · Posted 06-05-2023 10:26 AM

That is the nature of an SQL join. It 2 observations from one dataset match 4 observations form the other the other you get 2*4 = 8 observations, not 2+4 or max(2,4) observations you might get with a SET or MERGE statement.

For example for this particular query if the same ICUSTOMER has two different names then you will get two different output observations.

Sandeep77 · Posted 06-05-2023 10:30 AM

@Tom The reason for accounts to be repeated is the accountnumber is repeated for some records where they do not have linked accounts outside 131 repcode. I tried using distinct a.accountnumber to avoid this issue but it shows syntax error. Can you suggest how can I avoid duplicate accountnumber?

Tom · Posted 06-05-2023 10:35 AM

If the issue is that observations from TEST_1 have the same other values except for ACCOUNTNUMBER and you just want to take the first ACCOUNTNUMBER then I would do that before trying to combine the datasets.

That is not something that SQL is very good at. It would be better to do that with a data step.

Sandeep77 · Posted 06-05-2023 10:39 AM

Thank you!

Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Re: Finding common value from two datasets

Registration is open

SAS Training: Just a Click Away