Solved: Re: amateur here. need to merge two tables where the common variables ...

du5ty_____ · Posted 08-18-2020 12:44 PM

I have 2 tables with several common variables but not all of them. Table A has a list of Account ID's with phone1, phone2, phone 3, and email variables. Table B is much larger in rows and has a list of Account ID's with phone1, phone2, phone3 and email variables as well. my goal is to search Table B by each observation, and see if any phone or email in table B matches any phone or email in Table A where the personID's are NOT the same (evident in the code below using ^=). I want to be able to see which Accounts IDs in Table A had a prior record of the same phone number or email used as accounts in Table B.

the code I have written so far gathers the data for both tables separately. I cannot "join" these tables together, without creating 10 different proc sql inner join statements, then a datastep to merge them all together. in theory this should work but its taking hours to run and cannot complete. im wondering is there a more efficient way to do this?

I have tableA named "strawlist2" and tableB named "priorapps". here is an example of 2 of the 10 proc sql's followed by the datastep at the end to append them all together. without disclosing too much info, each table has a unique ID to the person, a unique ID to the application, and a unique ID to the account. I do not want to merge the 2 on any of those variables, just phones and emails. ideally, I would create a new variable in the datastep that denotes which variable in that observation returned a match. ie. if TableA acct#X phone1 matched up with TableB acct#Y phone3, the final dataset I have would be the table A records and the match variable would show the Acct ID it was matched to via a phone or email.. I hope this makes sense

proc sql;

create table emailMatch as

select

*

from strawlist2 as a inner join priorapps as b

on a.email = b.email2

where a.borr_ssn_ein_lrm ^= b.SSN2;

quit;

proc sql;

create table phone1match as

select

*

from strawlist2 as a inner join priorapps as b

on a.phone_number = b.PhNm1

where a.borr_ssn_ein_lrm ^= b.SSN2;

quit;

data final;

set emailMatch phone1match phone1match2 phone1matchRel phone2match1 phone2match2 phone2matchRel RelmatchP1 RelMatchp2 RelmatchRel2;

run;

Shmuel · Posted 08-18-2020 02:02 PM

You did not supply any test data.

I hope next code will give you base to adaptation to your needs

data table1;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;

data table2;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;

data tmp1;
 set table1;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;
  
data tmp2;
 set table2;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;

proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;

proc sql;
  select a.phone, 
         a.id as id1,
         b.id as id2
  from tmp1 as a
  join tmp2 as b
  on a.phone = b.phone and
     a.id ^= b.id and b.id ^= .;
quit;

View solution in original post

Shmuel · Posted 08-18-2020 02:02 PM

You did not supply any test data.

I hope next code will give you base to adaptation to your needs

data table1;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;

data table2;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;

data tmp1;
 set table1;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;
  
data tmp2;
 set table2;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;

proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;

proc sql;
  select a.phone, 
         a.id as id1,
         b.id as id2
  from tmp1 as a
  join tmp2 as b
  on a.phone = b.phone and
     a.id ^= b.id and b.id ^= .;
quit;

du5ty_____ · Posted 08-18-2020 02:41 PM

thank you! can you explain how this is working? from what I can tell, you are talking the variables in each table and by loading it into an array then output, you are transposing it so that all variable instances are now in one column, which now duplicate ID #'s. then the joining in the proc sql sorts it out from there finding the matches.

Shmuel · Posted 08-18-2020 03:12 PM

I have used a data step to transpose the data, so that all phones in same column.

If you prefer you may use PROC TRNSPOSE for same function.

By using proc sql to select wanted observations you may skip the sorting steps

as sql will do it implicitly.

amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Re: amateur here. need to merge two tables where the common variables are dynamic. (SAS EG 5.1)

Ready to join fellow brilliant minds for the SAS Hackathon?

Click image to register for webinar

Classroom Training Available!