Hi all,
We are using SAS 9.1.3 on Mainframe z/OS.
I expected the following simplified code to produce 3 output observations, but instead it produces 4, the last of which is blank:
data test;
infile datalines eof=no_more_data;
input text $char3.;
return;
no_more_data:
putlog 'Finished';
*stop;
datalines;
AAA
BBB
CCC
;
Uncommenting the stop statement creates the desired output of 3 observations.
Inserting an extra return statement immediately before the datalines statement still produces 4 output observations.
I have the feeling it something to do with the use of the eof= option along with the labelled section, but I couldn't find any evidence to back this up in the documentation or the web.
Does anyone have an explanation for this behaviour, or have I just missed something?
Thanks & regards,
Amir.
As written your data step includes an implied RETURN; as the last statement before cards. When control is transferred to the statement label, statement(s) are executed until a RETURN implied or explicit. Or STOP of course, but STOP is very strong medicine indeed. In the absence of explicit OUTPUT, OUTPUT is implied with RETURN.
So what did you expect and what do you want to happen?
As written your data step includes an implied RETURN; as the last statement before cards. When control is transferred to the statement label, statement(s) are executed until a RETURN implied or explicit. Or STOP of course, but STOP is very strong medicine indeed. In the absence of explicit OUTPUT, OUTPUT is implied with RETURN.
So what did you expect and what do you want to happen?
Hi all,
Thank you all for taking the time to respond. I would say you have confirmed my suspicions about what was happening in the data step, and I was glad to hear the suggestions on the use of an explicitly coded output statement as this was one of the things I had tried. Another thing I tried was a delete statement in place of the stop, which also gave 3 output observations, although it's use might be somewhat puzzling at first sight.
data _null_: In answer to your questions: "So what did you expect..." I expected 3 observations to be output; "...and what do you want to happen?" I wanted 3 observations to be output.
Thanks & regards,
Amir.
Delete works by accident here - by the time you hit it, you're already on the fourth iteration of the datastep and so the required records have already been output. So it deletes the non existent record... and as you're at end of file, the datastep terminates.
Amir
Wouldn't it be better to use the end= option like this:
data test;
infile datalines end=no_more_data;
input text $char3.;
if no_more_data then putlog 'Finished';
datalines;
AAA
BBB
CCC
;
It don't work.
2443 data test;
2444 infile datalines end=no_more_data;
WARNING: The value of the INFILE END= option cannot be set for CARDS or DATALINES input.
2445 input text $char3.;
2446 if no_more_data then putlog 'Finished';
2447 datalines;
That'll teach me to try it first!
You need to understand what's happening here.
SAS can't process the EOF statement without going through the data step a fourth time. Cos it identifies EOF on the fourth iteration of the input statement when it finds nothing more to read.
So the implicit output statement referred to in the previous answers will be executed a fourth time, unless you specifically prevent it.
Stop is part of the language to terminate the datastep when automatic termination isn't appropriate. This is one time when stop is appropriate.
Return won't do it, cos it'll just send the processing back to the command after the call to no_more_data.
An alternative is to use an explicit output statement before the return, this will supress the automatic/implied output statement at the end of the datastep and as it's after the link to 'no_more_data' and there's no return after the label, it will only be executed three times, once for each input record found. There isn't an implicit return statement here. SAS just drops out of the datastep once eof is reached.
As a matter of clarity, always use return statements at the end of each link section and at the end of the code before the labels.
not sure why 'Stop is very strong medicine indeed', unless it's being mixed up with endsas.
So your code should be:
data test;
/* better/clearer */
infile datalines eof=no_more_data;
input text $char3.;
return;
no_more_data:
putlog 'Finished';
stop;
return;
datalines;
AAA
BBB
CCC
;
OR:
data test;
/* will work */
infile datalines eof=no_more_data;
input text $char3.;
output;
return;
no_more_data:
putlog 'Finished';
return;
datalines;
AAA
BBB
CCC
;
Amir,
It sounds like you're looking for an explanation, so here's mine.
The normal end to a DATA step is to have the INPUT statement fail because it tries to read an extra line of data that doesn't exist. You can demonstrate that by adding this line both before and after the INPUT statement:
put _all_;
Adding EOF= changes that behavior. Instead of ending the DATA step when the INPUT statement fails, control is passed to the labeled section. The labeled section executes (again, try adding _all_ within the labeled PUT statement). However, the DATA step does not halt as it usually would. There is no failed INPUT statement. Instead, the fourth observation gets output (you can verify this in part by adding a RETAIN ID statement and seeing how the results change). Then the DATA step halts as the software "figures out" that it would have to loop forever if it were to continue.
That's my story and I'm sticking to it.
Good luck.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.