BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
du5ty_____
Calcite | Level 5

I have 2 tables with several common variables but not all of them. Table A has a list of Account ID's with phone1, phone2, phone 3, and email variables. Table B is much larger in rows and has a list of Account ID's with phone1, phone2, phone3 and email variables as well. my goal is to search Table B by each observation, and see if any phone or email in table B matches any phone or email in Table A where the personID's are NOT the same (evident in the code below using ^=). I want to be able to see which Accounts IDs in Table A had a prior record of the same phone number or email used as accounts in Table B.

 

the code I have written so far gathers the data for both tables separately. I cannot "join" these tables together, without creating 10 different proc sql inner join statements, then a datastep to merge them all together. in theory this should work but its taking hours to run and cannot complete. im wondering is there a more efficient way to do this?

 

I have tableA named "strawlist2" and tableB named "priorapps". here is an example of 2 of the 10 proc sql's followed by the datastep at the end to append them all together. without disclosing too much info, each table has a unique ID to the person, a unique ID to the application, and a unique ID to the account. I do not want to merge the 2 on any of those variables, just phones and emails. ideally, I would create a new variable in the datastep that denotes which variable in that observation returned a match. ie. if TableA acct#X phone1 matched up with TableB acct#Y phone3, the final dataset I have would be the table A records and the match variable would show the Acct ID it was matched to via a phone or email.. I hope this makes sense

 

proc sql;

create table emailMatch as

select

*

from strawlist2 as a inner join priorapps as b

on a.email = b.email2

where a.borr_ssn_ein_lrm ^= b.SSN2;

quit;

 

proc sql;

create table phone1match as

select

*

from strawlist2 as a inner join priorapps as b

on a.phone_number = b.PhNm1

where a.borr_ssn_ein_lrm ^= b.SSN2;

quit;

 

data final;

set emailMatch phone1match phone1match2 phone1matchRel phone2match1 phone2match2 phone2matchRel RelmatchP1 RelMatchp2 RelmatchRel2;

run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

You did not supply any test data.

I hope next code will give you base to adaptation to your needs

data table1;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;

data table2;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;

data tmp1;
 set table1;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;
  
data tmp2;
 set table2;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;

proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;

proc sql;
  select a.phone, 
         a.id as id1,
         b.id as id2
  from tmp1 as a
  join tmp2 as b
  on a.phone = b.phone and
     a.id ^= b.id and b.id ^= .;
quit;

View solution in original post

3 REPLIES 3
Shmuel
Garnet | Level 18

You did not supply any test data.

I hope next code will give you base to adaptation to your needs

data table1;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;

data table2;
  infile datalines;
  input ID  phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;

data tmp1;
 set table1;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;
  
data tmp2;
 set table2;
     array vx phone1-phone3;
	 do i=1 to dim(vx);
	    phone = vx(i);
		ouput;
		keep phone id;
     end;
run;

proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;

proc sql;
  select a.phone, 
         a.id as id1,
         b.id as id2
  from tmp1 as a
  join tmp2 as b
  on a.phone = b.phone and
     a.id ^= b.id and b.id ^= .;
quit;
du5ty_____
Calcite | Level 5

thank you! can you explain how this is working? from what I can tell, you are talking the variables in each table and by loading it into an array then output, you are transposing it so that all variable instances are now in one column, which now duplicate ID #'s. then the joining in the proc sql sorts it out from there finding the matches.

Shmuel
Garnet | Level 18

I have used a data step to transpose the data, so that all phones in same column.

If you prefer you may use PROC TRNSPOSE for same function.

 

By using proc sql to select wanted observations you may skip the sorting steps

as sql will do it implicitly.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 797 views
  • 1 like
  • 2 in conversation