BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
vanness145
Fluorite | Level 6

 

data JE.KeywordMatchTemp1;
  if _n_ = 1 then do;
    do i = 1 by 1 until (eof);
      set JE.KeyWords end=eof;
      array keywords[100] $30 _temporary_;
      keywords[i] = Key_Words;
    end;
    last_i = i ;
    retain last_i ;
  end;
  set JE.JEMasterTemp;
  match = 0;
  do i = 1 to last_i while (match=0) ;
    if index(descr, trim(keywords[i]) ) then match = 1;
  end;
  drop i last_i;
run;

Hi all, I am having difficulty to understand what does this line does.. 

if _n_ = 1 then do;

I understand that it creates a field of auto increment by 1 for each records in the dataset.

I have tried to remove this statement and logically speaking the code should run fine too right without this statement?

 

Instead I am getting obs capping of 1 at this set statement.

 set JE.JEMasterTemp;

 

I need help...

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

The data step is an implicit loop.

 

SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).

 

the code 

 

if _N_=1 then do ;

Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.

 

In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.

 

Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.

 

At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.

 

On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords.  Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.

 

The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.

 

The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once.  Temporary arrays are automatically retained.

 

If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note).  Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.

 

 

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.

View solution in original post

4 REPLIES 4
Quentin
Super User

The data step is an implicit loop.

 

SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).

 

the code 

 

if _N_=1 then do ;

Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.

 

In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.

 

Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.

 

At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.

 

On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords.  Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.

 

The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.

 

The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once.  Temporary arrays are automatically retained.

 

If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note).  Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.

 

 

The Boston Area SAS Users Group is hosting free webinars!
Next up: Bart Jablonski and I present 53 (+3) ways to do a table lookup on Wednesday Sep 18.
Register now at https://www.basug.org/events.
vanness145
Fluorite | Level 6
Thank you so much for the explanation!
ChrisNZ
Tourmaline | Level 20

This program loads table KEYWORDS into an array when the data step runs its first iteration.

Then for each iteration (of the data set MASTERTEMP) it tries to find a match.

If you try to load KEYWORDS each time:

1) it is wasteful

2) SAS stops, as you have seen, as you try to read past the last observation of KEYWORDS on the second iteration.

So the   if _N_=1   test is very necessary.

 

Note that 

  do i = 1 to last_i while (match=0) ;
    if index(descr, trim(keywords[i]) ) then match = 1;
  end;

can be replaced with

  MATCH = whichc( DESCR, of KEYWORDS[*] ) > 0 ;

 

Kurt_Bremser
Super User

Aside from what the others said, today we would probably make use of a hash object:

data JE.KeywordMatchTemp1;
if _n_ = 1
then do;
  declare hash kw (dataset:"je.keywords");
  kw.definekey("key_words");
  kw.definedone();
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to countw(descr) while (match = 0);
  if kw.check(key:scan(descr,i)) then match = 1;
end;
drop i;
run;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1858 views
  • 8 likes
  • 4 in conversation