BookmarkSubscribeRSS Feed
vicky07
Quartz | Level 8

Hello,

 

I have a dataset name market DB which has four phone columns. I want to join this dataset with good_numbers on ID OR ID2 column and keep only matching phone numbers in my output.
So for example for ID2  56782322, I have a matching phone in good_numbers table so i want to bring in that records. For some ID's there will be more than one match(example -8799)

I am trying to use hashing but not getting any records back. Could someone please help? Thanks.

 

Code:

 

Data market_db;
input @1 ID :$4. @6 ID2 :$8. @15 py1 $10. @26 py22 $10. @37 py3 $10.  @48 py4 $10.;
infile cards missover ;
cards;
1325 56782322 5872584458 2478569877 8987786548
5489 58654851 8658656767                       2795003597  
3287 29365474 9831765765 5458456844           
8799 89745434 4897568698 3697489567 3265987465
;
run;	



Data good_numbers;
input ID :$8. confirm_phn :$10.;
infile cards;
cards;
56782322 5872584458
5489     2795003597
89745434 4897568698
2587     5872584458
9876     9831765765
8799     3697489567
;
Quit;


data want;
set market_db;
array phn py1-py4;
if _N_=1 then do;
dcl hash H(dataset:'good_numbers');
H.definekey('ID','confirm_phn');
H.definedone();
if 0 then set good_numbers;
end;
do over phn;
 confirm_phn=phn;
 if H.find() = 0;
end;
run;

Desired output: 

 

ID ID2 py1 py2 py3 py4
1325 56782322 5872584458      
5489 58654851       2795003597
8799 89745434 4897568698 3697489567    

 

7 REPLIES 7
ketpt42
Quartz | Level 8

There are several problems.

  1. You have to tell SAS which key values you're going to use to search in the hash object. Since you are searching both ID and ID2, you have to explicitly list the key values in the find() method.
  2. The length of both ID variables has to be the match the length of your ID variable in your hash object. I didn't include it below, but I set the length of ID in your market_db dataset to $8.
  3. Once you find a match, you need to tell the data step what to do. I used a output and leave statement.
data want;
set market_db;
array phn py1-py4;
if _N_=1 then do;
    dcl hash H(dataset:'good_numbers');
    H.definekey('ID','confirm_phn');
    H.definedata('confirm_phn');
    H.definedone();
    if 0 then set good_numbers;
end;
do over phn;
    if H.find(key:ID, key:phn) = 0 or h.find(key:id2, key:phn) = 0 then do;
        output;
        leave; 
    end;
end;
call missing(confirm_phn);
run;
vicky07
Quartz | Level 8

@ketpt42 Thanks for your response. When i run your solution I am getting the below error message. Could you please help?

 

 

WARNING: Multiple lengths were specified for the variable ID by input data set(s). This can cause truncation of data.
NOTE: There were 6 observations read from the data set WORK.GOOD_NUMBERS.
ERROR: Argument length greater than length of key variable ID at line 1747 column 36.
ERROR: Keys and data must be specified in the same order and have the same types as given in DefineKey/DefineData at line 1747 column 36.
ERROR: DATA STEP Component Object failure. Aborted during the EXECUTION phase.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1 observations read from the data set WORK.MARKET_DB.
WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 7 variables.
WARNING: Data set WORK.WANT was not replaced because this step was stopped.

ketpt42
Quartz | Level 8
Follow @ChrisNZ's advice #1 and my #2.
vicky07
Quartz | Level 8

I fixed the errors however the output is bringing in the phone numbers that doesn't belong to the ID in the look up dataset(confirm_phn). The below red highlighted phone numbers shouldn't be part of the output,  it's doing the  exclusion at ID level if there is no match but not removing the phone numbers for the matching ID if it doesn't belong in the "confirm_phn" table.

 

ID ID2 py1 py2 py3 py4
1325 56782322 5872584458 2478569877 8987786548  
5489 58654851 8658656767     2795003597
8799 89745434 4897568698 3697489567 3265987465  

 


data want;
if 0 then set good_numbers;

if _N_=1 then do;
dcl hash H(dataset:'good_numbers');
H.definekey('ID','confirm_phn');
H.definedata('confirm_phn');
H.definedone();

end;
set market_db;
array phn py1-py4;
do over phn;
if H.find(key:ID, key:phn) = 0 or h.find(key:id2, key:phn) = 0 then do;
output;
leave;
end;
end;
call missing(confirm_phn);
drop confirm_phn;
run;

ketpt42
Quartz | Level 8
So if your number is not found, set it to missing. You'll need to take out the leave statement and change the output so that it only happens on a line where at least one match was made.
Kurt_Bremser
Super User

Make your life easier by using an intelligent data structure:

data market_db;
input @1 ID :$4. @6 ID2 :$8. @15 py1 $10. @26 py2 $10. @37 py3 $10. @48 py4 $10.;
infile cards missover ;
cards;
1325 56782322 5872584458 2478569877 8987786548
5489 58654851 8658656767                       2795003597  
3287 29365474 9831765765 5458456844           
8799 89745434 4897568698 3697489567 3265987465
;

data good_numbers;
input ID :$8. confirm_phn :$10.;
infile cards;
cards;
56782322 5872584458
5489     2795003597
89745434 4897568698
2587     5872584458
9876     9831765765
8799     3697489567
;

proc transpose
  data=market_db
  out=long (
    rename=(col1=phn)
    where=(phn ne "")
  )
;
by id notsorted id2;
var py:;
run;

data market_db_intelligent;
set long;
py_type = substr(_name_,3);
drop _name_;
run;

data want;
set market_db_intelligent;
if _n_ = 1
then do;
  length cid $8;
  declare hash good (dataset:"good_numbers (rename=(id=cid confirm_phn=phn))");
  good.definekey('cid','phn');
  good.definedone();
end;
if good.check(key:id2,key:phn) = 0 or good.check(key:id,key:phn) = 0;
drop cid;
run;
  

You can see how transposing to a long dataset layout makes coding simple.

ChrisNZ
Tourmaline | Level 20

Two issues:

 

1. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log. Read your log.

 The length of ID in the WANT data set is 4. Move the  set  statement after the hash table creation.

 Always read your log. Even if there is no warnings and nothing seems wrong. There is a warning here. Always.

 

2. The line   if H.find() = 0;    stops all processing for that observation if the test result is false. Because no value is found in PY1, PY2 is never tested.

You need to learn to debug this kind of things. For example you can add the line  putlog PHN=;   in the  do over  loop.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2429 views
  • 0 likes
  • 4 in conversation