data JE.KeywordMatchTemp1;
if _n_ = 1 then do;
do i = 1 by 1 until (eof);
set JE.KeyWords end=eof;
array keywords[100] $30 _temporary_;
keywords[i] = Key_Words;
end;
last_i = i ;
retain last_i ;
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
drop i last_i;
run;
Hi all, I am having difficulty to understand what does this line does..
if _n_ = 1 then do;
I understand that it creates a field of auto increment by 1 for each records in the dataset.
I have tried to remove this statement and logically speaking the code should run fine too right without this statement?
Instead I am getting obs capping of 1 at this set statement.
set JE.JEMasterTemp;
I need help...
The data step is an implicit loop.
SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).
the code
if _N_=1 then do ;
Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.
In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.
Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.
At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.
On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords. Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.
The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.
The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once. Temporary arrays are automatically retained.
If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note). Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.
The data step is an implicit loop.
SAS creates an automatic variable _N_, which is a counter for that loop (iteration number).
the code
if _N_=1 then do ;
Tests whether _N_=1 , i.e. whether it is the first iteration of the data step.
In your example, on the first iteration of the data step, _N_=1 is true, therefore processing continues into the loop and all records from JE.KeyWords are read into a temporary array.
Then, still on _N_=1 control continues to the second SET statement and the first record is read from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR.
At the bottom of the DATA step, it outputs that first record to KE.KeywordMatchTemp1.
On the second iteration of the loop, _N_=2, so the first IF statement is false. Nothing is read from JE.KeyWords. Control continues to read the second record from JE.JEMasterTemp, and it checks the value of DESCR to see if any of the keywords stored in the array appear in DESCR, and outputs.
The DATA step keeps looping until the second SET statement tries to read another record from JE.JEMasterTemp but it can't because there are no more records, at which point it reads the (logical) end of file marker and stops.
The point of if _N_=1 in your code is to say that SAS only needs to read the key words into the array once. Temporary arrays are automatically retained.
If you remove the if _N_=1, then on the second iteration of the DATA step SAS will try to read a record from JE.KeyWords, but since it has has already read all of the records, the SET statement will read the (logical) end of file marker and the DATA step will stop prematurely (without a warning or note). Because one of the rules of the DATA step is that it stops when a SET satement reads in and end of file marker.
This program loads table KEYWORDS into an array when the data step runs its first iteration.
Then for each iteration (of the data set MASTERTEMP) it tries to find a match.
If you try to load KEYWORDS each time:
1) it is wasteful
2) SAS stops, as you have seen, as you try to read past the last observation of KEYWORDS on the second iteration.
So the if _N_=1 test is very necessary.
Note that
do i = 1 to last_i while (match=0) ;
if index(descr, trim(keywords[i]) ) then match = 1;
end;
can be replaced with
MATCH = whichc( DESCR, of KEYWORDS[*] ) > 0 ;
Aside from what the others said, today we would probably make use of a hash object:
data JE.KeywordMatchTemp1;
if _n_ = 1
then do;
declare hash kw (dataset:"je.keywords");
kw.definekey("key_words");
kw.definedone();
end;
set JE.JEMasterTemp;
match = 0;
do i = 1 to countw(descr) while (match = 0);
if kw.check(key:scan(descr,i)) then match = 1;
end;
drop i;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.