could any one expain this problem for me? thanks

Reply
N/A
Posts: 0

could any one expain this problem for me? thanks

data a;
a=1;
data b;
if _n_=1 then set a;
put _all_;
run;

the log said:
12 data b;
13 if _n_=1 then set a;
14 put _all_;
15 run;

a=1 _ERROR_=0 _N_=1
a=1 _ERROR_=0 _N_=2
NOTE: DATA STEP stopped due to looping.
NOTE: There were 1 observations read from the data set WORK.A.
NOTE: The data set WORK.B has 2 observations and 1 variables.

WHY dataset B has 2 observations?
SAS Super FREQ
Posts: 8,820

Re: could any one expain this problem for me? thanks

Hi,
Your issue is caused by the fact that if the SET statement is only executed when _N_ = 1, then you have prevented SAS from reading the end of file marker in data set A. The end of file marker is not detected on the first read of data set A -- but would have been detected on the SECOND loop through the DATA step. Which you can prove by this code:
[pre]
data whenstop;
put '----| top of data step' _n_=;
set a;
put _all_;
put '----| bottom of data step' _n_=;
run;
[/pre]
which produces this output in the log:
[pre]
376 data whenstop;
377 put '----| top of data step' _n_=;
378 set a;
379 put _all_;
380 put '----| bottom of data step' _n_=;
381 run;


----| top of data step_N_=1
a=1 _ERROR_=0 _N_=1
----| bottom of data step_N_=1
----| top of data step_N_=2

NOTE: There were 1 observations read from the data set WORK.A.
NOTE: The data set WORK.WHENSTOP has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):

[/pre]

As you can see in the above output..._N_ is actually 2 when the end of file marker is detected -- you don't normally care about this except in the instance when you prevent SAS from reading the end of file marker. So, in your code, the DATA step can't read the end of file marker on the second loop through the DATA step -- then it starts to loop, but stops itself (as indicated in the NOTE).

If you code this:
[pre]
data a;
a=1;
run;

data b;
if _n_=1 then set a;
put _all_;
output;
stop;
run;
[/pre]

Then you will get only 1 observation in data set b. But since that observation is the same observation that was in A, the above code doesn't seem to serve much of a purpose, except to explain how SAS only stops the DATA step when the end of file marker is reached.

I have usually seen this technique used under different circumstances. For example, suppose data set A has some variable that I want attached to EVERY observation in SASHELP.CLASS. This code would work to accomplish that:
[pre]
data a;
a=999;
run;

proc print data=a;
title 'data set a';
run;

data b;
if _n_=1 then set a;
set sashelp.class;
put _all_;
run;

proc print data=b;
title 'data set b';
run;
[/pre]

In the above case, the first set (when _n_ = 1) puts variable A in the program data vector and populates it with the value 999. Then, the SET statement for SASHELP.CLASS is executed for every one of the 19 obs. On the second and subsequent loops through the DATA step -- once for every obs in SASHELP.CLASS -- the SET statement for A is never executed again -- which is OK, because it has served the purpose of getting variable A on every obs in SASHELP.CLASS. Sometimes people use a slightly different version of this code for doing master/detail lookups. You'd have to consult the doc or with Tech Support for the correct way to code for that technique.

For more information on how the Program Data Vector is used during a DATA step program, the documentation or these papers might come in handy:
http://www2.sas.com/proceedings/sugi28/189-28.pdf
http://www2.sas.com/proceedings/sugi31/246-31.pdf
http://www2.sas.com/proceedings/sugi30/251-30.pdf

cynthia
Ask a Question
Discussion stats
  • 1 reply
  • 289 views
  • 0 likes
  • 2 in conversation