BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
km0927
Obsidian | Level 7

Hi. About 1 years ago, with help of this community, I could complete my project in time. After submit my project, I'm trying to understand the code that several programmers suggested. However, in 'Hash' function, it is hard to understand.

 

https://communities.sas.com/t5/SAS-Programming/cumulative-sum-of-largest-value-of-each-group/m-p/498...

 

In this post, I asked how to do sum of largest value of each group cumulatively. Ksharp, Grand advisor of this community, suggested the usage of 'hash' function. I'm reading some posts about 'hash' function and understand the meaning of this function a little. But, there are some difficulties left. Please help the questions i asked below.

 

The code Ksharp suggested is this.

 

data want;
if _n_=1 then do;
if 0 then set have;
declare hash h();
declare hiter hi('h');
h.definekey('clinic');
h.definedata('clinic','count','countsquare');
h.definedone();
end;
set have;
by id;
if first.id then h.clear();
if h.find()=0 then do;count=count+1;countsquare=count**2;h.replace();end;
else do;count=1;countsquare=1;h.replace();end;
_clinic=clinic;_countsquare=countsquare;sum=0;
do while(hi.next()=0);
if _clinic ne clinic then sum+countsquare;
end;
sum+_countsquare ; clinic=_clinic;
drop _:;
run;

 

1) what is the meaning of '_n_'?

 

if _n_=1 then do;
if 0 then set have;

 

In raw data, there was no 'n' variable. In my knowledge, _n_ may mean starting operations of 'hash' iteration. Then, what is 'if 0'? In first row, _n_ means 1, in this conditions, _n_ must remain '1' but second row, it suddenly became 0.

 

2) what is the meaning of 'h.find()=0'?

 

if h.find()=0 then do;

 

This is the hardest code to understand. Maybe, the 'find' function means the 'definedata' in hash object, which are 'clinic', 'count' and 'countsquare'. But there was no code to set 'definedata' to zero.

 

3) what is 'else do'?

 

else do;count=1;countsquare=1;h.replace();end;

 

This means 'h.find() ne 0'. So the definedata in hash object is not 0 which means the total is above zero and this may be not the first row. right? Then, why the 'count' became 1?

 

4) what is 'do while(hi.next()=0);'?

 

The hi is hiter of hash object. In my knowledge, this mean it import the previous defined hash object, which is 'h'. The 'next' function means the next value in 'definekey' and this mean that do loop until the value in next key become positive. right?

 

5) what is 'if _clinic ne clinic'?

 

In previous code, '_clinic=clinic;' Ksharp set the '_clinic' variable same with 'clinic' variable. However, in this 'if' funciton, _clinic is not equal to clinic. With previous condition, '_clinic=clinic', there must be no case that _clinic is not equal to clinic. However, this if statement worked well. What is the mechanism?

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

1. _n_ is an automatic variable which holds the number of the current iteration of the data step. This means that the hash object is created once when execution starts.

 

2. The find() method returns a zero when it finds a match.

 

3. else is part of the if statement. do/end are used to define code blocks, similar to begin/end in PASCAL or the curly brackets in C. The branch is entered when no object was found, and then the replace() method is executed.

 

4. The next() method returns a zero if further objects in the iterative hash are found, and sets all variables from the definedata() method to them values from the hash object. This means that clinic can (and will) be changed. Therefore the test in your question 5.

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

1. _n_ is an automatic variable which holds the number of the current iteration of the data step. This means that the hash object is created once when execution starts.

 

2. The find() method returns a zero when it finds a match.

 

3. else is part of the if statement. do/end are used to define code blocks, similar to begin/end in PASCAL or the curly brackets in C. The branch is entered when no object was found, and then the replace() method is executed.

 

4. The next() method returns a zero if further objects in the iterative hash are found, and sets all variables from the definedata() method to them values from the hash object. This means that clinic can (and will) be changed. Therefore the test in your question 5.

km0927
Obsidian | Level 7

Thanks for your kind answer. Let me ask a few more questions.

2. The find() method returns a zero when it finds a match.


I understood this way. In that code, find()=0 means that SAS found the specific value of clinic(=definekey) in rawdata. So, 'zero' means the matching with hash object and rawdata was done. Right?


3. I can't understand what is "no object was found"


The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?

Tom
Super User Tom
Super User

@km0927 wrote:

...
The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?


No.  You are making the hash object as part of the data step that is making the WANT dataset.  You did NOT tell it to load any data.  In fact when you start a new by group (as indicated by the first.ID flag) you are emptying the hash object.

Kurt_Bremser
Super User

@km0927 wrote:



The hash object is made from "have" dataset. So, the definekey of hash object is all of the 'clinic' value in "have" dataset. If statement is performed in "have" dataset. So, I think all object can be found in "have" dataset. However, when i deleted this statement, all value turned out missing. Would you let me know what is the mechanism?


The hash object is created empty (no dataset parameter in the declare statement), cleared with every group change, and then slowly filled; calling the replace() method for an element that does not yet exist is equivalent to calling the add() method.

To make the code easier to understand, I would introduce some visual formatting:

data want;
if _n_ = 1
then do;
  if 0 then set have;
  declare hash h();
  declare hiter hi('h');
  h.definekey('clinic');
  h.definedata('clinic','count','countsquare');
  h.definedone();
end;
set have;
by id;
if first.id then h.clear();
if h.find() = 0
then do;
  count = count + 1;
  countsquare = count ** 2;
  h.replace();
end;
else do;
  count = 1;
  countsquare = 1;
  h.replace();
end;
_clinic = clinic;
_countsquare = countsquare;
sum = 0;
do while (hi.next() = 0);
  if _clinic ne clinic then sum + countsquare;
end;
sum + _countsquare;
clinic = _clinic;
drop _:;
run;
km0927
Obsidian | Level 7

Maybe this will be my last question. I'm really appreciate all your help.

 

According to your answers, I think the hash code follow this process.

 

if first.id then h.clear();

: In the first row of id, the 'definedata' get cleared. So, 'clinic', 'count', 'countsquare' value in hash object will be deleted.

 

if h.find() = 0

: I think when the hash code run first time, it read the first row of definekey. For example, "Brad" get injured and he went to clinic "A". So, in "Brad" medical utilization data, the first row of clinic is "A". Hash code import "A" in clinic data. After, hash code finds "A" in "have" dataset, and it returns 0.

 

then do;
count = count + 1;
countsquare = count ** 2;
h.replace();
end;

: If "Brad" visited clinic "A" continuously, "h.find" return 0 iteratively. So "count+1 and count**2" process run iteratively.

 

else do;
count = 1;
countsquare = 1;
h.replace();

: This mean SAS finds the new value of definekey, 'clinic'. For example, Brad had visited the clinic "A", but he went new clinic "B". This situation means 'else'

 

(skipped)

 

do while (hi.next() = 0);
if _clinic ne clinic then sum + countsquare;

hi.next()=0 means, If Brad visited clinics like this,

clinic : A -> A -> A -> B -> B -> B -> ...
next : 0 -> 0 -> not zero -> 0 -> 0 -> 0 ....

Ksharp
Super User

Beg me I have no time to elaborate all these to you.

But I think @hashman  (expert of Hash Table) would like to help you .

yabwon
Onyx | Level 15

Hi @km0927 ,

 

Let me recommend you some solid reading about hash tables (not hash functions, hash functions are something different [see MD5() function for example]):

1)

https://www.lexjansen.com/sesug/2016/HOW-195_Final_PDF.pdf

2)

https://support.sas.com/resources/papers/proceedings15/3024-2015.pdf

3)

https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/236-30.pdf

 

and of course 4) "The Hash Book":

https://www.sas.com/storefront/aux/en/sphashtables/69153_excerpt.pdf (it's 50 pages sample)

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



hashman
Ammonite | Level 13

@km0927: Methinks you've got all the info you need from @yabwon and @Kurt_Bremser. I'll add that the only way to make the hash object a useful tool on one's personal SAS  programming arsenal is to use it - and of course make and fix lots of mistakes on one's way to mastery. The end result, however, is more than worth the effort.

 

Kind regards

Paul D. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2225 views
  • 7 likes
  • 6 in conversation