hHi All,
I am trying to do a Many to many merge using Hashing. The below code works fine:
data ae;
infile cards;
input ptnum $ 1-3 @5 date date9. event $ 15-35;
format date date9.;
cards;
ABC 16NOV2009 Nausea
DEF 16NOV2009 Heartburn
DEF 16NOV2009 Acid Indigestion
DEF 18NOV2009 Nausea
GHI 17NOV2009 Fever
GHI 18NOV2009 Fever
MNO 17NOV2009 Fever
;
run;
data cm;
infile cards;
input ptnum $ 1-3 @5 date date9. medication $ 15-35;
format date date9.;
cards;
ABC 16NOV2009 Dopamine
DEF 16NOV2009 Antacid
DEF 16NOV2009 Sodium bicarbonate
def 18NOV2009 Dopamine
ghi 18NOV2009 Asprin
JKL 19NOV2009 Asprin
mno 17NOV2009 Asprin
;
run;
Data ae_rspndt;
Set ae;
If _n_ = 1 then do;
if 0 then set cm ;
declare hash cm(dataset: "work.cm",hashexp:16,multidata:"Yes") ;
cm.defineKey("ptnum", 'date');
cm.defineData(ALL: 'YES');
cm.defineDone() ;
end;
rc = cm.find();
more = 0 ;
if rc = 0 then do;
output ;
cm.has_next(result: more) ;
do while ( more ) ;
rc = cm.find_next() ;
if rc = 0 then output ;
cm.has_next(result: more) ;
end;
end;
Run;
The problem is:
If you run this code then you will find that the CM dataset (which is assigned to Hash table) has some lower case values in ptnum variable. Due to this the code is not finding those keys and inturn not returning the corresponding records. Since my CM dataset is huge (regularly goes more than 20 GB), I am not preferring a second data step to do the Upcase/Compress operations otherwise the performance goes down.
What I want is, while finding the key values Hash searches ae dataset regardless of cases of ptnum and return the matched records. So in short, is there a way to compress/upcase character variables while defining keys in Hash?
Or any other way which does not use a second dataset (or downgrades the performance in significant ways).
Thanks in advance for all your help.
Regards,
Saugata Mukherjee.
Can't you load the hash your self. The you can modify any of the variables and or keys you like.
Hi data_null_;
This definitely works
And whether I load the hash using add method (as you have mentioned) or SAS automatically loads it (as metioned in my initial question), it should not affect the performance (speed) in any ways. Am I right?
Regards,
Saugata Mukherjee.
I know it works I tested the program with your data. :smileymischief:
It would seem performance should be equal but you can only answer those questions with testing.
While your testing you can test loading a view
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.