Hello,
I have a dataset name market DB which has four phone columns. I want to join this dataset with good_numbers on ID OR ID2 column and keep only matching phone numbers in my output.
So for example for ID2 56782322, I have a matching phone in good_numbers table so i want to bring in that records. For some ID's there will be more than one match(example -8799)
I am trying to use hashing but not getting any records back. Could someone please help? Thanks.
Code:
Data market_db;
input @1 ID :$4. @6 ID2 :$8. @15 py1 $10. @26 py22 $10. @37 py3 $10. @48 py4 $10.;
infile cards missover ;
cards;
1325 56782322 5872584458 2478569877 8987786548
5489 58654851 8658656767 2795003597
3287 29365474 9831765765 5458456844
8799 89745434 4897568698 3697489567 3265987465
;
run;
Data good_numbers;
input ID :$8. confirm_phn :$10.;
infile cards;
cards;
56782322 5872584458
5489 2795003597
89745434 4897568698
2587 5872584458
9876 9831765765
8799 3697489567
;
Quit;
data want;
set market_db;
array phn py1-py4;
if _N_=1 then do;
dcl hash H(dataset:'good_numbers');
H.definekey('ID','confirm_phn');
H.definedone();
if 0 then set good_numbers;
end;
do over phn;
confirm_phn=phn;
if H.find() = 0;
end;
run;
Desired output:
ID | ID2 | py1 | py2 | py3 | py4 |
1325 | 56782322 | 5872584458 | |||
5489 | 58654851 | 2795003597 | |||
8799 | 89745434 | 4897568698 | 3697489567 |
There are several problems.
data want;
set market_db;
array phn py1-py4;
if _N_=1 then do;
dcl hash H(dataset:'good_numbers');
H.definekey('ID','confirm_phn');
H.definedata('confirm_phn');
H.definedone();
if 0 then set good_numbers;
end;
do over phn;
if H.find(key:ID, key:phn) = 0 or h.find(key:id2, key:phn) = 0 then do;
output;
leave;
end;
end;
call missing(confirm_phn);
run;
@ketpt42 Thanks for your response. When i run your solution I am getting the below error message. Could you please help?
WARNING: Multiple lengths were specified for the variable ID by input data set(s). This can cause truncation of data.
NOTE: There were 6 observations read from the data set WORK.GOOD_NUMBERS.
ERROR: Argument length greater than length of key variable ID at line 1747 column 36.
ERROR: Keys and data must be specified in the same order and have the same types as given in DefineKey/DefineData at line 1747 column 36.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.MARKET_DB.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 7 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.
I fixed the errors however the output is bringing in the phone numbers that doesn't belong to the ID in the look up dataset(confirm_phn). The below red highlighted phone numbers shouldn't be part of the output, it's doing the exclusion at ID level if there is no match but not removing the phone numbers for the matching ID if it doesn't belong in the "confirm_phn" table.
ID | ID2 | py1 | py2 | py3 | py4 |
1325 | 56782322 | 5872584458 | 2478569877 | 8987786548 | |
5489 | 58654851 | 8658656767 | 2795003597 | ||
8799 | 89745434 | 4897568698 | 3697489567 | 3265987465 |
data want;
if 0 then set good_numbers;
if _N_=1 then do;
dcl hash H(dataset:'good_numbers');
H.definekey('ID','confirm_phn');
H.definedata('confirm_phn');
H.definedone();
end;
set market_db;
array phn py1-py4;
do over phn;
if H.find(key:ID, key:phn) = 0 or h.find(key:id2, key:phn) = 0 then do;
output;
leave;
end;
end;
call missing(confirm_phn);
drop confirm_phn;
run;
Make your life easier by using an intelligent data structure:
data market_db;
input @1 ID :$4. @6 ID2 :$8. @15 py1 $10. @26 py2 $10. @37 py3 $10. @48 py4 $10.;
infile cards missover ;
cards;
1325 56782322 5872584458 2478569877 8987786548
5489 58654851 8658656767 2795003597
3287 29365474 9831765765 5458456844
8799 89745434 4897568698 3697489567 3265987465
;
data good_numbers;
input ID :$8. confirm_phn :$10.;
infile cards;
cards;
56782322 5872584458
5489 2795003597
89745434 4897568698
2587 5872584458
9876 9831765765
8799 3697489567
;
proc transpose
data=market_db
out=long (
rename=(col1=phn)
where=(phn ne "")
)
;
by id notsorted id2;
var py:;
run;
data market_db_intelligent;
set long;
py_type = substr(_name_,3);
drop _name_;
run;
data want;
set market_db_intelligent;
if _n_ = 1
then do;
length cid $8;
declare hash good (dataset:"good_numbers (rename=(id=cid confirm_phn=phn))");
good.definekey('cid','phn');
good.definedone();
end;
if good.check(key:id2,key:phn) = 0 or good.check(key:id,key:phn) = 0;
drop cid;
run;
You can see how transposing to a long dataset layout makes coding simple.
Two issues:
1. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log.
The length of ID in the WANT data set is 4. Move the set statement after the hash table creation.
Always read your log. Even if there is no warnings and nothing seems wrong. There is a warning here. Always.
2. The line if H.find() = 0; stops all processing for that observation if the test result is false. Because no value is found in PY1, PY2 is never tested.
You need to learn to debug this kind of things. For example you can add the line putlog PHN=; in the do over loop.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.