Hi. About 1 years ago, with help of this community, I could complete my project in time. After submit my project, I'm trying to understand the code that several programmers suggested. However, in 'Hash' function, it is hard to understand.
In this post, I asked how to do sum of largest value of each group cumulatively. Ksharp, Grand advisor of this community, suggested the usage of 'hash' function. I'm reading some posts about 'hash' function and understand the meaning of this function a little. But, there are some difficulties left. Please help the questions i asked below.
The code Ksharp suggested is this.
data want;
if _n_=1 then do;
if 0 then set have;
declare hash h();
declare hiter hi('h');
h.definekey('clinic');
h.definedata('clinic','count','countsquare');
h.definedone();
end;
set have;
by id;
if first.id then h.clear();
if h.find()=0 then do;count=count+1;countsquare=count**2;h.replace();end;
else do;count=1;countsquare=1;h.replace();end;
_clinic=clinic;_countsquare=countsquare;sum=0;
do while(hi.next()=0);
if _clinic ne clinic then sum+countsquare;
end;
sum+_countsquare ; clinic=_clinic;
drop _:;
run;
1) what is the meaning of '_n_'?
if _n_=1 then do;
if 0 then set have;
In raw data, there was no 'n' variable. In my knowledge, _n_ may mean starting operations of 'hash' iteration. Then, what is 'if 0'? In first row, _n_ means 1, in this conditions, _n_ must remain '1' but second row, it suddenly became 0.
2) what is the meaning of 'h.find()=0'?
if h.find()=0 then do;
This is the hardest code to understand. Maybe, the 'find' function means the 'definedata' in hash object, which are 'clinic', 'count' and 'countsquare'. But there was no code to set 'definedata' to zero.
3) what is 'else do'?
else do;count=1;countsquare=1;h.replace();end;
This means 'h.find() ne 0'. So the definedata in hash object is not 0 which means the total is above zero and this may be not the first row. right? Then, why the 'count' became 1?
4) what is 'do while(hi.next()=0);'?
The hi is hiter of hash object. In my knowledge, this mean it import the previous defined hash object, which is 'h'. The 'next' function means the next value in 'definekey' and this mean that do loop until the value in next key become positive. right?
5) what is 'if _clinic ne clinic'?
In previous code, '_clinic=clinic;' Ksharp set the '_clinic' variable same with 'clinic' variable. However, in this 'if' funciton, _clinic is not equal to clinic. With previous condition, '_clinic=clinic', there must be no case that _clinic is not equal to clinic. However, this if statement worked well. What is the mechanism?
1. _n_ is an automatic variable which holds the number of the current iteration of the data step. This means that the hash object is created once when execution starts.
2. The find() method returns a zero when it finds a match.
3. else is part of the if statement. do/end are used to define code blocks, similar to begin/end in PASCAL or the curly brackets in C. The branch is entered when no object was found, and then the replace() method is executed.
4. The next() method returns a zero if further objects in the iterative hash are found, and sets all variables from the definedata() method to them values from the hash object. This means that clinic can (and will) be changed. Therefore the test in your question 5.
1. _n_ is an automatic variable which holds the number of the current iteration of the data step. This means that the hash object is created once when execution starts.
2. The find() method returns a zero when it finds a match.
3. else is part of the if statement. do/end are used to define code blocks, similar to begin/end in PASCAL or the curly brackets in C. The branch is entered when no object was found, and then the replace() method is executed.
4. The next() method returns a zero if further objects in the iterative hash are found, and sets all variables from the definedata() method to them values from the hash object. This means that clinic can (and will) be changed. Therefore the test in your question 5.
Thanks for your kind answer. Let me ask a few more questions.
2. The find() method returns a zero when it finds a match.
I understood this way. In that code, find()=0 means that SAS found the specific value of clinic(=definekey) in rawdata. So, 'zero' means the matching with hash object and rawdata was done. Right?
3. I can't understand what is "no object was found"
The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?
@km0927 wrote:
...
The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?
No. You are making the hash object as part of the data step that is making the WANT dataset. You did NOT tell it to load any data. In fact when you start a new by group (as indicated by the first.ID flag) you are emptying the hash object.
@km0927 wrote:
The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?
The hash object is created empty (no dataset parameter in the declare statement), cleared with every group change, and then slowly filled; calling the replace() method for an element that does not yet exist is equivalent to calling the add() method.
To make the code easier to understand, I would introduce some visual formatting:
data want;
if _n_ = 1
then do;
if 0 then set have;
declare hash h();
declare hiter hi('h');
h.definekey('clinic');
h.definedata('clinic','count','countsquare');
h.definedone();
end;
set have;
by id;
if first.id then h.clear();
if h.find() = 0
then do;
count = count + 1;
countsquare = count ** 2;
h.replace();
end;
else do;
count = 1;
countsquare = 1;
h.replace();
end;
_clinic = clinic;
_countsquare = countsquare;
sum = 0;
do while (hi.next() = 0);
if _clinic ne clinic then sum + countsquare;
end;
sum + _countsquare;
clinic = _clinic;
drop _:;
run;
Maybe this will be my last question. I'm really appreciate all your help.
According to your answers, I think the hash code follow this process.
if first.id then h.clear();
: In the first row of id, the 'definedata' get cleared. So, 'clinic', 'count', 'countsquare' value in hash object will be deleted.
if h.find() = 0
: I think when the hash code run first time, it read the first row of definekey. For example, "Brad" get injured and he went to clinic "A". So, in "Brad" medical utilization data, the first row of clinic is "A". Hash code import "A" in clinic data. After, hash code finds "A" in "have" dataset, and it returns 0.
then do;
count = count + 1;
countsquare = count ** 2;
h.replace();
end;
: If "Brad" visited clinic "A" continuously, "h.find" return 0 iteratively. So "count+1 and count**2" process run iteratively.
else do;
count = 1;
countsquare = 1;
h.replace();
: This mean SAS finds the new value of definekey, 'clinic'. For example, Brad had visited the clinic "A", but he went new clinic "B". This situation means 'else'
(skipped)
do while (hi.next() = 0);
if _clinic ne clinic then sum + countsquare;
hi.next()=0 means, If Brad visited clinics like this,
clinic : A -> A -> A -> B -> B -> B -> ...
next : 0 -> 0 -> not zero -> 0 -> 0 -> 0 ....
Beg me I have no time to elaborate all these to you.
But I think @hashman (expert of Hash Table) would like to help you .
Hi @km0927 ,
Let me recommend you some solid reading about hash tables (not hash functions, hash functions are something different [see MD5() function for example]):
1)
https://www.lexjansen.com/sesug/2016/HOW-195_Final_PDF.pdf
2)
https://support.sas.com/resources/papers/proceedings15/3024-2015.pdf
3)
https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/236-30.pdf
and of course 4) "The Hash Book":
https://www.sas.com/storefront/aux/en/sphashtables/69153_excerpt.pdf (it's 50 pages sample)
All the best
Bart
@km0927: Methinks you've got all the info you need from @yabwon and @Kurt_Bremser. I'll add that the only way to make the hash object a useful tool on one's personal SAS programming arsenal is to use it - and of course make and fix lots of mistakes on one's way to mastery. The end result, however, is more than worth the effort.
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.