Hello,
I am trying to create a hash code, which does the following:
1) A table lookup with output showing matches,
2) Provide a count of the output table (duplicate and nonduplicate) and store value into a variable,
I have provided the code in proc sql format which works. THis is the output that i want.
I inserted the hash code as well, for any corrections/ changes.
I am really keen on learning Hash programming - would u recommend any website/ books with examples?
All help would be deeply appreciated.
my proc sql code that works:
data sat;
input client $;
datalines;
531
620
531
908
620
143
run;
data main;
input keys $;
datalines;
530
620
532
909
620
142
run;
proc sql;
create table duplicates as
select keys
from main
where keys in (select distinct client from sat);
select count(1) into :dupcnt
from duplicates;
run;
proc sql;
create table nonduplicates as
select distinct keys
from main
where keys in (select distinct client from sat);
select count(1) into :nondupcnt
from nonduplicates;
run;
%Put Duplicate count: &dupcnt.;
%Put nonduplicate count: &nondupcnt.;
My Hash code is as below: (not working)
data duplicates nonduplicates (keep=client);
if _n_=1 then
do;
if 0 then set sat;
declare hash w (dataset:"work.sat");
rc=w.definekey('client');
rc=w.definedata('client');
rc=w.definedone();
end;
set work.main;
rc= w.find(key:keys);
if rc eq 0 then output duplicates;
else output nonduplicates;
run;
many thanks.
regards,
Sebastian
Here are some sample code based on your data.
data sat; input client $; datalines; 531 620 531 908 143 run; data main; input keys $; datalines; 530 620 532 909 620 142 run; data _null_; if _n_=1 then do; dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y'); sat.definekey('client'); sat.definedone(); length client $ 3; dcl hash uni(); uni.definekey('keys'); uni.definedone(); end; set main end=last; if sat.check(key:keys)=0 then do; rc=uni.check(); if rc ne 0 then do; unique+1; rc=uni.add(); end; duplicate+1; end; if last then do; call symputx('dupcnt',duplicate); call symputx('nondupcnt',unique); end; run; %Put Total Matched count: &dupcnt.; %Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
Hi,
Its not clear to me why you are going to all that trouble. This will create three tables, both has matches, p_sat has those only in sat, p_main only those in p_main. You can then run freq's and things off this data to get counts?
data sat; input client $; datalines; 531 620 531 908 620 143 ; run; data main; input keys $; datalines; 530 620 532 909 620 142 ; run; proc sort data=sat; by client; run; proc sort data=main (rename=(keys=client)); by client; run; data both p_sat p_main; merge sat (in=sat) main (in=main); by client; if sat and main then output both; else if sat then output p_sat; else output p_main; run;
Thank you for your reply RW9.
You are right in your post. However, I have multiple massive datasets, with require long waiting times. I could brew my coffee between each session 🙂
After browsing a few forums and doing a bit of reading, i found HASH tables to be a quicker solution.
I also, wanted to incoporate a min() and max() function to show the min max values of a given variable, with some further´if statements, but want to solve this one step at a time by myself, after i understand the basic code.
An inexpensive book with examples is SAS Hash Object Programming Made Easy by Michele M. Burlew. I am not a hash expert and am just learning myself but it's been helpful.
Best,
Mark
THank you Mark. I have already done the purchase on Amazon 🙂
Here are some sample code based on your data.
data sat; input client $; datalines; 531 620 531 908 143 run; data main; input keys $; datalines; 530 620 532 909 620 142 run; data _null_; if _n_=1 then do; dcl hash sat(dataset:'work.sat(keep=client)', multidata:'y'); sat.definekey('client'); sat.definedone(); length client $ 3; dcl hash uni(); uni.definekey('keys'); uni.definedone(); end; set main end=last; if sat.check(key:keys)=0 then do; rc=uni.check(); if rc ne 0 then do; unique+1; rc=uni.add(); end; duplicate+1; end; if last then do; call symputx('dupcnt',duplicate); call symputx('nondupcnt',unique); end; run; %Put Total Matched count: &dupcnt.; %Put Total Matched Distinct count: &nondupcnt.;
Notice that I have used "Total Matched count" and "Total Matched Distinct count" at the final output. Dup or nondup is a bit confusing. Good Luck and Happy Learning!
THank you for this solution Haikuo. Its friday, and more time to crack some extra SAS features 🙂
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.