Hello,
I am trying to create a hash code, which does the following:
1) A table lookup with output showing matches,
2) Provide a count of the output table (duplicate and nonduplicate) and store value into a variable,
I have provided the code in proc sql format which works. THis is the output that i want.
I inserted the hash code as well, for any corrections/ changes.
I am really keen on learning Hash programming - would u recommend any website/ books with examples?
All help would be deeply appreciated.
my proc sql code that works:
data sat;
input client $;
datalines;
531
620
531
908
620
143
run;
data main;
input keys $;
datalines;
530
620
532
909
620
142
run;
proc sql;
create table duplicates as
select keys
from main
where keys in (select distinct client from sat);
select count(1) into :dupcnt
from duplicates;
run;
proc sql;
create table nonduplicates as
select distinct keys
from main
where keys in (select distinct client from sat);
select count(1) into :nondupcnt
from nonduplicates;
run;
%Put Duplicate count: &dupcnt.;
%Put nonduplicate count: &nondupcnt.;
My Hash code is as below: (not working)
data duplicates nonduplicates (keep=client);
if _n_=1 then
do;
if 0 then set sat;
declare hash w (dataset:"work.sat");
rc=w.definekey('client');
rc=w.definedata('client');
rc=w.definedone();
end;
set work.main;
rc= w.find(key:keys);
if rc eq 0 then output duplicates;
else output nonduplicates;
run;
many thanks.
regards,
Sebastian
Here are some sample code based on your data.
data sat; input client $; datalines; 531 620 531 908 143 run; data main; input keys $; datalines; 530 620 532 909 620 142 run; data _null_; if _n_=1 then do; dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y'); sat.definekey('client'); sat.definedone(); length client $ 3; dcl hash uni(); uni.definekey('keys'); uni.definedone(); end; set main end=last; if sat.check(key:keys)=0 then do; rc=uni.check(); if rc ne 0 then do; unique+1; rc=uni.add(); end; duplicate+1; end; if last then do; call symputx('dupcnt',duplicate); call symputx('nondupcnt',unique); end; run; %Put Total Matched count: &dupcnt.; %Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
Hi,
Its not clear to me why you are going to all that trouble. This will create three tables, both has matches, p_sat has those only in sat, p_main only those in p_main. You can then run freq's and things off this data to get counts?
data sat; input client $; datalines; 531 620 531 908 620 143 ; run; data main; input keys $; datalines; 530 620 532 909 620 142 ; run; proc sort data=sat; by client; run; proc sort data=main (rename=(keys=client)); by client; run; data both p_sat p_main; merge sat (in=sat) main (in=main); by client; if sat and main then output both; else if sat then output p_sat; else output p_main; run;
Thank you for your reply RW9.
You are right in your post. However, I have multiple massive datasets, with require long waiting times. I could brew my coffee between each session 🙂
After browsing a few forums and doing a bit of reading, i found HASH tables to be a quicker solution.
I also, wanted to incoporate a min() and max() function to show the min max values of a given variable, with some further´if statements, but want to solve this one step at a time by myself, after i understand the basic code.
An inexpensive book with examples is SAS Hash Object Programming Made Easy by Michele M. Burlew. I am not a hash expert and am just learning myself but it's been helpful.
Best,
Mark
THank you Mark. I have already done the purchase on Amazon 🙂
Here are some sample code based on your data.
data sat; input client $; datalines; 531 620 531 908 143 run; data main; input keys $; datalines; 530 620 532 909 620 142 run; data _null_; if _n_=1 then do; dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y'); sat.definekey('client'); sat.definedone(); length client $ 3; dcl hash uni(); uni.definekey('keys'); uni.definedone(); end; set main end=last; if sat.check(key:keys)=0 then do; rc=uni.check(); if rc ne 0 then do; unique+1; rc=uni.add(); end; duplicate+1; end; if last then do; call symputx('dupcnt',duplicate); call symputx('nondupcnt',unique); end; run; %Put Total Matched count: &dupcnt.; %Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
THank you for this solution Haikuo. Its friday, and more time to crack some extra SAS features 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.