Hello,
I am trying to create a hash code, which does the following:
1) A table lookup with output showing matches,
2) Provide a count of the output table (duplicate and nonduplicate) and store value into a variable,
I have provided the code in proc sql format which works. THis is the output that i want.
I inserted the hash code as well, for any corrections/ changes.
I am really keen on learning Hash programming - would u recommend any website/ books with examples?
All help would be deeply appreciated.
my proc sql code that works:
data sat;
input client $;
datalines;
531
620
531
908
620
143
run;
data main;
input keys $;
datalines;
530
620
532
909
620
142
run;
proc sql;
create table duplicates as
select keys
from main
where keys in (select distinct client from sat);
select count(1) into :dupcnt
from duplicates;
run;
proc sql;
create table nonduplicates as
select distinct keys
from main
where keys in (select distinct client from sat);
select count(1) into :nondupcnt
from nonduplicates;
run;
%Put Duplicate count: &dupcnt.;
%Put nonduplicate count: &nondupcnt.;
My Hash code is as below: (not working)
data duplicates nonduplicates (keep=client);
if _n_=1 then
do;
if 0 then set sat;
declare hash w (dataset:"work.sat");
rc=w.definekey('client');
rc=w.definedata('client');
rc=w.definedone();
end;
set work.main;
rc= w.find(key:keys);
if rc eq 0 then output duplicates;
else output nonduplicates;
run;
many thanks.
regards,
Sebastian
Here are some sample code based on your data.
data sat;
input client $;
datalines;
531
620
531
908
143
run;
data main;
input keys $;
datalines;
530
620
532
909
620
142
run;
data _null_;
if _n_=1 then
do;
dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y');
sat.definekey('client');
sat.definedone();
length client $ 3;
dcl hash uni();
uni.definekey('keys');
uni.definedone();
end;
set main end=last;
if sat.check(key:keys)=0 then
do;
rc=uni.check();
if rc ne 0 then
do;
unique+1;
rc=uni.add();
end;
duplicate+1;
end;
if last then
do;
call symputx('dupcnt',duplicate);
call symputx('nondupcnt',unique);
end;
run;
%Put Total Matched count: &dupcnt.;
%Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
Hi,
Its not clear to me why you are going to all that trouble. This will create three tables, both has matches, p_sat has those only in sat, p_main only those in p_main. You can then run freq's and things off this data to get counts?
data sat; input client $; datalines; 531 620 531 908 620 143 ; run; data main; input keys $; datalines; 530 620 532 909 620 142 ; run; proc sort data=sat; by client; run; proc sort data=main (rename=(keys=client)); by client; run; data both p_sat p_main; merge sat (in=sat) main (in=main); by client; if sat and main then output both; else if sat then output p_sat; else output p_main; run;
Thank you for your reply RW9.
You are right in your post. However, I have multiple massive datasets, with require long waiting times. I could brew my coffee between each session 🙂
After browsing a few forums and doing a bit of reading, i found HASH tables to be a quicker solution.
I also, wanted to incoporate a min() and max() function to show the min max values of a given variable, with some further´if statements, but want to solve this one step at a time by myself, after i understand the basic code.
An inexpensive book with examples is SAS Hash Object Programming Made Easy by Michele M. Burlew. I am not a hash expert and am just learning myself but it's been helpful.
Best,
Mark
THank you Mark. I have already done the purchase on Amazon 🙂
Here are some sample code based on your data.
data sat;
input client $;
datalines;
531
620
531
908
143
run;
data main;
input keys $;
datalines;
530
620
532
909
620
142
run;
data _null_;
if _n_=1 then
do;
dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y');
sat.definekey('client');
sat.definedone();
length client $ 3;
dcl hash uni();
uni.definekey('keys');
uni.definedone();
end;
set main end=last;
if sat.check(key:keys)=0 then
do;
rc=uni.check();
if rc ne 0 then
do;
unique+1;
rc=uni.add();
end;
duplicate+1;
end;
if last then
do;
call symputx('dupcnt',duplicate);
call symputx('nondupcnt',unique);
end;
run;
%Put Total Matched count: &dupcnt.;
%Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
THank you for this solution Haikuo. Its friday, and more time to crack some extra SAS features 🙂
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.