BookmarkSubscribeRSS Feed
ZRick
Obsidian | Level 7

in advanced certification prep, book  in hash table chapter, it has following data step.

Can someone help me understand what exactly does "if _N_=1 then do" do?

I really don't see the need of using this statement.

data work.difference (drop= goalamount);

     length goalamount 8;

     if _N_ = 1 then do;

          declare hash goal( );

          goal.definekey("QtrNum");

          goal.definedata("GoalAmount");

          goal.definedone( );

          call missing(qtrnum, goalamount);

          goal.add(key:’qtr1’, data:10 );

          goal.add(key:’qtr2’, data:15 );

          goal.add(key:’qtr3’, data: 5 );

          goal.add(key:’qtr4’, data:15 );

     end;

     set sasuser.contrib;

     goal.find();

     Diff = amount - goalamount;

run;

17 REPLIES 17
manojinpec
Obsidian | Level 7

Hi ,

This says that if the step is reading the first observation in datastep then hash variable is decalred and properties are set including the key variabels and data variables.

ZRick
Obsidian | Level 7

can I remove this line? what effect will it have without this line then?

Ksharp
Super User

You must keep it. If you remove it.

At every data loop, SAS will re-build this hash table, this is not what you need.

Ksharp

Haikuo
Onyx | Level 15

Try removing it, and then compare your log, you will see what Ksharp means.

Haikuo

ZRick
Obsidian | Level 7

Ksharp, you said: "At every data loop, SAS will re-build this hash table"

I do not understand, where does the loop come from?

I removed the line, nothing happened, I still do not see the magic of this line here.

Haikuo
Onyx | Level 15

Ok, notice this line in your code:

set sasuser.contrib;

The number of loop is the total number of  obs in 'sasuser.contrib' plus 1. It comes from the implicit loop of 'set'. Unless you stop(abort) or skip the loop somewhere in your downstream code, it will be n+1, n being the number of obs in 'sasuser.contrib'.

Having said that, it would be somehow different if you apply DOW on  'set' statement, such as:

do until (your conditions);

   set sasuser.contrib;

blah blah;

end;

Then the number of the loop will be the number of DOW plus 1.

Haikuo

Edit: if you remove _n_ line, you will NOT see errors if your original code has no error. You will see bunch of notes telling your hash object has been initiated, then couple of lines later, initiated again, and again.

art297
Opal | Level 21

Hai.kuo: Not on 9.2!  Thus, it is a good question.  The result looks like it will stay the same but, without the if statement, the processing time will increase dramatically.

Haikuo
Onyx | Level 15

I see, Art. Probably that is why OP is so confused. Thanks for pointing it out! Learned!

In addition to increased processing time, without first _n_ loop, it won't work if hash() need to be dynanmically  modified during the course.

FloydNevseta
Pyrite | Level 9

ZRick wrote:

Ksharp, you said: "At every data loop, SAS will re-build this hash table"

I do not understand, where does the loop come from?

I removed the line, nothing happened, I still do not see the magic of this line here.

ZRick,

You make a comment like this which leads one to believe you do not understand how a data step works. So Art provided a link that exactly shows how a data step loops and where _n_ comes from, but you totally dismissed his help. Take 5 minutes to read the link, then maybe you will understand where the looping occurs and why. From that point, maybe you will get insight into why the hash object only needs to be declared and populated once when _n_ = 1.

ZRick
Obsidian | Level 7

still try to understand this _N_ better, so what what contains _N_=1?

In addition, under _N_=1, it only loop once, is that it?

art297
Opal | Level 21

If you are preparing for the certification exam, you will probably want to read (at least):

http://support.sas.com/documentation/cdl/en/basess/58133/HTML/default/viewer.htm#a001290590.htm

ZRick
Obsidian | Level 7

thank you for pointing me the interesting link, but I am more focused on understanding the logic of the code behind it.

Tom
Super User Tom
Super User

One way to think about how SAS processes a data step is to consider a simple step to calculate a new variable.

data new;

   set old;

   y= x*x ;

run;

Now if there are 100 observations in OLD then SAS must execute the assignment statement that creates Y 100 times.  So the implied loop over all input data is what lets that happen.

This concept is one of the things that makes creating SAS programs so much simplier than the old FORTRAN or PL/I programs we had to use before SAS was developed. Or for that matter more modern languages such as Java or Excel.

SASKiwi
PROC Star

_N_ is an automatic SAS counter that can be used to find out how many times the DATA step has looped.

The purpose for it in your example is to only create and load the hash table once, at the start of the first loop through the step. It only needs to be done once.

If you removed this check the hash table would be created and loaded for every record the step is processing!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 17 replies
  • 35029 views
  • 11 likes
  • 8 in conversation