- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have 2 tables with several common variables but not all of them. Table A has a list of Account ID's with phone1, phone2, phone 3, and email variables. Table B is much larger in rows and has a list of Account ID's with phone1, phone2, phone3 and email variables as well. my goal is to search Table B by each observation, and see if any phone or email in table B matches any phone or email in Table A where the personID's are NOT the same (evident in the code below using ^=). I want to be able to see which Accounts IDs in Table A had a prior record of the same phone number or email used as accounts in Table B.
the code I have written so far gathers the data for both tables separately. I cannot "join" these tables together, without creating 10 different proc sql inner join statements, then a datastep to merge them all together. in theory this should work but its taking hours to run and cannot complete. im wondering is there a more efficient way to do this?
I have tableA named "strawlist2" and tableB named "priorapps". here is an example of 2 of the 10 proc sql's followed by the datastep at the end to append them all together. without disclosing too much info, each table has a unique ID to the person, a unique ID to the application, and a unique ID to the account. I do not want to merge the 2 on any of those variables, just phones and emails. ideally, I would create a new variable in the datastep that denotes which variable in that observation returned a match. ie. if TableA acct#X phone1 matched up with TableB acct#Y phone3, the final dataset I have would be the table A records and the match variable would show the Acct ID it was matched to via a phone or email.. I hope this makes sense
proc sql;
create table emailMatch as
select
*
from strawlist2 as a inner join priorapps as b
on a.email = b.email2
where a.borr_ssn_ein_lrm ^= b.SSN2;
quit;
proc sql;
create table phone1match as
select
*
from strawlist2 as a inner join priorapps as b
on a.phone_number = b.PhNm1
where a.borr_ssn_ein_lrm ^= b.SSN2;
quit;
data final;
set emailMatch phone1match phone1match2 phone1matchRel phone2match1 phone2match2 phone2matchRel RelmatchP1 RelMatchp2 RelmatchRel2;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You did not supply any test data.
I hope next code will give you base to adaptation to your needs
data table1;
infile datalines;
input ID phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;
data table2;
infile datalines;
input ID phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;
data tmp1;
set table1;
array vx phone1-phone3;
do i=1 to dim(vx);
phone = vx(i);
ouput;
keep phone id;
end;
run;
data tmp2;
set table2;
array vx phone1-phone3;
do i=1 to dim(vx);
phone = vx(i);
ouput;
keep phone id;
end;
run;
proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;
proc sql;
select a.phone,
a.id as id1,
b.id as id2
from tmp1 as a
join tmp2 as b
on a.phone = b.phone and
a.id ^= b.id and b.id ^= .;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You did not supply any test data.
I hope next code will give you base to adaptation to your needs
data table1;
infile datalines;
input ID phone1 $ phone2 $ phone3 $;
datalines;
1 aaa bbb ccc
5 ddd eee fff
7 xxx zzz yyy
run;
data table2;
infile datalines;
input ID phone1 $ phone2 $ phone3 $;
datalines;
2 ggg aaa mmm
7 xxx zzz www
run;
data tmp1;
set table1;
array vx phone1-phone3;
do i=1 to dim(vx);
phone = vx(i);
ouput;
keep phone id;
end;
run;
data tmp2;
set table2;
array vx phone1-phone3;
do i=1 to dim(vx);
phone = vx(i);
ouput;
keep phone id;
end;
run;
proc sort data=tmp1; by phone id; run;
proc sort data=tmp2; by phone id; run;
proc sql;
select a.phone,
a.id as id1,
b.id as id2
from tmp1 as a
join tmp2 as b
on a.phone = b.phone and
a.id ^= b.id and b.id ^= .;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you! can you explain how this is working? from what I can tell, you are talking the variables in each table and by loading it into an array then output, you are transposing it so that all variable instances are now in one column, which now duplicate ID #'s. then the joining in the proc sql sorts it out from there finding the matches.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have used a data step to transpose the data, so that all phones in same column.
If you prefer you may use PROC TRNSPOSE for same function.
By using proc sql to select wanted observations you may skip the sorting steps
as sql will do it implicitly.