BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cheitzig
Calcite | Level 5

Hi,

 

I have two questions about a the code below. I have it working fine; I am curious to understand why I need to do pieces of it the way I have it working. What the code below does is searches a set of .sas programs in a set of directories for a set of strings, and then stores the full path to the sas program that has each string anywhere in the directory tree. I realize I could be more efficient about my "find" strategy (i.e., I could combine all of the strings in one grep statement with multiple -e parameters), but I wanted individual rows for each match for subsequent processing anyway). Plus, I didn't think of that alternate strategy until I'd already written this... and it works.

 

My questions are these:

  1. Why do I need the "stop" command after the last end? If I exclude the stop command, it becomes an infinite loop and just keeps executing on the four "rows" created by the 2x2 do loop.
    1. I suspect it has something to do with how SAS knows in general when to exit a data step, but the answer isn't obvious to me. In regular data step processing, when, for example, a set data set is being processed, SAS stops when it gets to EOF of the incoming data set(s). In this case, there are no incoming datasets, but we're creating just the four records to go through. 
  2. If I use a do until (i.e., do until (eof)) instead of a do while (i.e., do while (not eof)), the program exits without error the first time it encounters an infile pipe with no matching files.
    1. Again, it works fine with the do while loop. I understand that a do until loop executes once before checking the condition, but why does executing either the input, checking _infile_ value, or output cause the overall program to exit?
data ListOfFilePaths;
	length topOfFileTree $50 stringsToSearchFor $15 pipecmd $150 filePath $500;

	do topOfFileTree = "/firstTopLevelDirectory", "/secondTopLevelDirectory";
		do stringsToSearchFor =     "string1", "string2";
			pipecmd="find " || strip(topOfFileTree) || " -name '*.sas' -type f -exec grep -ilF " || strip(dbname) || " {} \;";

			infile dummy pipe filevar=pipecmd end=eof;

			do while (not eof); *if replace this with do until (eof), then data step exits on first infile pipe with no rows;
				input;
				filePath=strip(_infile_);
				output;
			end;

		end;
	end;

	stop; *if this isn't here, program seems to become an infinite loop. Not sure why;
run;
1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

1. Why do I need the "stop" command after the last end?

Because SAS has never reached (tried to read) an end-of-file, so it never stops.

You can replace stop; with input; for the same effect.

 

2. Why does executing either the input, checking _infile_ value, or output cause the overall program to exit?

Same reason: SAS has reached an end-of-file, so it stops then and there.

 

Consider this data step:

data ListOfFilePaths;
  length  PIPECMD $150 FILEPATH $500;
  do I=4, 99999999999999, 6;
    PIPECMD=cats("dir ""%sysfunc(pathname(work))"" | find """, I, """ ");
    infile dummy pipe filevar=PIPECMD end=EOF;            
    do until(EOF);                                  putlog 'before' I=;
      input;                                        putlog 'after'  I=;  
      FILEPATH=strip(_infile_);                                                
      output;                                                                          
end; end; stop; run;

Value 6 is never processed and you can see in the log where the step stops (between 'before' and 'after').

Remove the unmatched value 99999999999 and 6 is processed.

View solution in original post

3 REPLIES 3
ChrisNZ
Tourmaline | Level 20

1. Why do I need the "stop" command after the last end?

Because SAS has never reached (tried to read) an end-of-file, so it never stops.

You can replace stop; with input; for the same effect.

 

2. Why does executing either the input, checking _infile_ value, or output cause the overall program to exit?

Same reason: SAS has reached an end-of-file, so it stops then and there.

 

Consider this data step:

data ListOfFilePaths;
  length  PIPECMD $150 FILEPATH $500;
  do I=4, 99999999999999, 6;
    PIPECMD=cats("dir ""%sysfunc(pathname(work))"" | find """, I, """ ");
    infile dummy pipe filevar=PIPECMD end=EOF;            
    do until(EOF);                                  putlog 'before' I=;
      input;                                        putlog 'after'  I=;  
      FILEPATH=strip(_infile_);                                                
      output;                                                                          
end; end; stop; run;

Value 6 is never processed and you can see in the log where the step stops (between 'before' and 'after').

Remove the unmatched value 99999999999 and 6 is processed.

cheitzig
Calcite | Level 5

Thanks, Chris. I'm not sure I fully understand still though. I hope you'll entertain a follow-up thought or two (not really a question, I realize-- more of just repeating back what you said).

 

Your explaination to my #2 makes sense to me. You're saying that once SAS actually hits an EOF for any reason, it ends the data step right then. By using a do while (not eof), SAS isn't actually hitting the eof on a statement, so it keeps going. That makes sense and is kind of vaguely what I suspected.

 

The distinction I'm not getting as clearly though is the two do loops not ending once it gets to EOF. I pasted a version below that has ONLY the do loops and an output, and this version ends fine without the stop. Same if it didn't even have the output statement at all. I think what's happening here is that after exiting the do loop, four records have been written, and then sas realizes it's at eof at the end of the data step, right?

 

I suspect the difference is that with the infile in the middle of the program, and explicitly checking for and preventing sas from getting the eof (using a do while (not eof)), sas never encounters the eof from anywhere. And, I guess I see that makes sense, but just confirming that we're in-sync.

 

data ListOfFilePaths;
	length topOfFileTree $50 stringsToSearchFor $15;

	do topOfFileTree = "/firstTopLevelDirectory", "/secondTopLevelDirectory";
		do stringsToSearchFor =     "string1", "string2";
			output;
		end;
	end;

run;
ChrisNZ
Tourmaline | Level 20

If your new code, since you are not reading a table or a file, the data step does not iterate (and doesn't wait for the last record since there isn't any).

It stops when reaching the end of the data step, identified by the run; statement.

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1937 views
  • 2 likes
  • 2 in conversation