- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
in advanced certification prep, book in hash table chapter, it has following data step.
Can someone help me understand what exactly does "if _N_=1 then do" do?
I really don't see the need of using this statement.
data work.difference (drop= goalamount);
length goalamount 8;
if _N_ = 1 then do;
declare hash goal( );
goal.definekey("QtrNum");
goal.definedata("GoalAmount");
goal.definedone( );
call missing(qtrnum, goalamount);
goal.add(key:’qtr1’, data:10 );
goal.add(key:’qtr2’, data:15 );
goal.add(key:’qtr3’, data: 5 );
goal.add(key:’qtr4’, data:15 );
end;
set sasuser.contrib;
goal.find();
Diff = amount - goalamount;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi ,
This says that if the step is reading the first observation in datastep then hash variable is decalred and properties are set including the key variabels and data variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
can I remove this line? what effect will it have without this line then?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You must keep it. If you remove it.
At every data loop, SAS will re-build this hash table, this is not what you need.
Ksharp
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try removing it, and then compare your log, you will see what Ksharp means.
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ksharp, you said: "At every data loop, SAS will re-build this hash table"
I do not understand, where does the loop come from?
I removed the line, nothing happened, I still do not see the magic of this line here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Ok, notice this line in your code:
set sasuser.contrib;
The number of loop is the total number of obs in 'sasuser.contrib' plus 1. It comes from the implicit loop of 'set'. Unless you stop(abort) or skip the loop somewhere in your downstream code, it will be n+1, n being the number of obs in 'sasuser.contrib'.
Having said that, it would be somehow different if you apply DOW on 'set' statement, such as:
do until (your conditions);
set sasuser.contrib;
blah blah;
end;
Then the number of the loop will be the number of DOW plus 1.
Haikuo
Edit: if you remove _n_ line, you will NOT see errors if your original code has no error. You will see bunch of notes telling your hash object has been initiated, then couple of lines later, initiated again, and again.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hai.kuo: Not on 9.2! Thus, it is a good question. The result looks like it will stay the same but, without the if statement, the processing time will increase dramatically.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I see, Art. Probably that is why OP is so confused. Thanks for pointing it out! Learned!
In addition to increased processing time, without first _n_ loop, it won't work if hash() need to be dynanmically modified during the course.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
ZRick wrote:
Ksharp, you said: "At every data loop, SAS will re-build this hash table"
I do not understand, where does the loop come from?
I removed the line, nothing happened, I still do not see the magic of this line here.
ZRick,
You make a comment like this which leads one to believe you do not understand how a data step works. So Art provided a link that exactly shows how a data step loops and where _n_ comes from, but you totally dismissed his help. Take 5 minutes to read the link, then maybe you will understand where the looping occurs and why. From that point, maybe you will get insight into why the hash object only needs to be declared and populated once when _n_ = 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
still try to understand this _N_ better, so what what contains _N_=1?
In addition, under _N_=1, it only loop once, is that it?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you are preparing for the certification exam, you will probably want to read (at least):
http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you for pointing me the interesting link, but I am more focused on understanding the logic of the code behind it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
One way to think about how SAS processes a data step is to consider a simple step to calculate a new variable.
data new;
set old;
y= x*x ;
run;
Now if there are 100 observations in OLD then SAS must execute the assignment statement that creates Y 100 times. So the implied loop over all input data is what lets that happen.
This concept is one of the things that makes creating SAS programs so much simplier than the old FORTRAN or PL/I programs we had to use before SAS was developed. Or for that matter more modern languages such as Java or Excel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
_N_ is an automatic SAS counter that can be used to find out how many times the DATA step has looped.
The purpose for it in your example is to only create and load the hash table once, at the start of the first loop through the step. It only needs to be done once.
If you removed this check the hash table would be created and loaded for every record the step is processing!