BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vanness145
Fluorite | Level 6

 

data JE.KeywordMatchTemp1;
  if _n_ = 1 then do;
    do i = 1 by 1 until (eof);
      set JE.KeyWords end=eof;
      array keywords[100] $30 _temporary_;
      keywords[i] = Key_Words;
    end;
    last_i = i ;
    retain last_i ;
  end;
  set JE.JEMasterTemp;
  match = 0;
  do i = 1 to last_i while (match=0) ;
    if index(descr, trim(keywords[i]) ) then match = 1;
  end;
  drop i last_i;
run;

Hi all, I am having difficulty to understand what does this line does.. 

if _n_ = 1 then do;

I understand that it creates a field of auto increment by 1 for each records in the dataset.

I have tried to remove this statement and logically speaking the code should run fine too right without this statement?

 

Instead I am getting obs capping of 1 at this set statement.

 set JE.JEMasterTemp;

 

I need help...

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

The data step is an implicit loop.

 

SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).

 

the code 

 

if _N_=1 then do ;

Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.

 

In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.

 

Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.

 

At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.

 

On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords.  Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.

 

The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.

 

The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once.  Temporary arrays are automatically retained.

 

If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note).  Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.

 

 

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

4 REPLIES 4
Quentin
Super User

The data step is an implicit loop.

 

SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).

 

the code 

 

if _N_=1 then do ;

Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.

 

In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.

 

Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.

 

At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.

 

On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords.  Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.

 

The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.

 

The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once.  Temporary arrays are automatically retained.

 

If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note).  Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.

 

 

BASUG is hosting free webinars Next up: Mike Raithel presenting on validating data files on Wednesday July 17. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
vanness145
Fluorite | Level 6
Thank you so much for the explanation!
ChrisNZ
Tourmaline | Level 20

This program loads table KEYWORDS into an array when the data step runs its first iteration.

Then for each iteration (of the data set MASTERTEMP) it tries to find a match.

If you try to load KEYWORDS each time:

1) it is wasteful

2) SAS stops, as you have seen, as you try to read past the last observation of KEYWORDS on the second iteration.

So the   if _N_=1   test is very necessary.

 

Note that 

  do i = 1 to last_i while (match=0) ;
    if index(descr, trim(keywords[i]) ) then match = 1;
  end;

can be replaced with

  MATCH = whichc( DESCR, of KEYWORDS[*] ) > 0 ;

 

Kurt_Bremser
Super User

Aside from what the others said, today we would probably make use of a hash object:

data JE.KeywordMatchTemp1;
if _n_ = 1
then do;
  declare hash kw (dataset:"je.keywords");
  kw.definekey("key_words");
  kw.definedone();
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to countw(descr) while (match = 0);
  if kw.check(key:scan(descr,i)) then match = 1;
end;
drop i;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1727 views
  • 8 likes
  • 4 in conversation