Hi,
I have two questions about a the code below. I have it working fine; I am curious to understand why I need to do pieces of it the way I have it working. What the code below does is searches a set of .sas programs in a set of directories for a set of strings, and then stores the full path to the sas program that has each string anywhere in the directory tree. I realize I could be more efficient about my "find" strategy (i.e., I could combine all of the strings in one grep statement with multiple -e parameters), but I wanted individual rows for each match for subsequent processing anyway). Plus, I didn't think of that alternate strategy until I'd already written this... and it works.
My questions are these:
data ListOfFilePaths;
length topOfFileTree $50 stringsToSearchFor $15 pipecmd $150 filePath $500;
do topOfFileTree = "/firstTopLevelDirectory", "/secondTopLevelDirectory";
do stringsToSearchFor = "string1", "string2";
pipecmd="find " || strip(topOfFileTree) || " -name '*.sas' -type f -exec grep -ilF " || strip(dbname) || " {} \;";
infile dummy pipe filevar=pipecmd end=eof;
do while (not eof); *if replace this with do until (eof), then data step exits on first infile pipe with no rows;
input;
filePath=strip(_infile_);
output;
end;
end;
end;
stop; *if this isn't here, program seems to become an infinite loop. Not sure why;
run;
1. Why do I need the "stop" command after the last end?
Because SAS has never reached (tried to read) an end-of-file, so it never stops.
You can replace stop; with input; for the same effect.
2. Why does executing either the input, checking _infile_ value, or output cause the overall program to exit?
Same reason: SAS has reached an end-of-file, so it stops then and there.
Consider this data step:
data ListOfFilePaths;
length PIPECMD $150 FILEPATH $500;
do I=4, 99999999999999, 6;
PIPECMD=cats("dir ""%sysfunc(pathname(work))"" | find """, I, """ ");
infile dummy pipe filevar=PIPECMD end=EOF;
do until(EOF); putlog 'before' I=;
input; putlog 'after' I=;
FILEPATH=strip(_infile_);
output;
end;
end;
stop;
run;
Value 6 is never processed and you can see in the log where the step stops (between 'before' and 'after').
Remove the unmatched value 99999999999 and 6 is processed.
1. Why do I need the "stop" command after the last end?
Because SAS has never reached (tried to read) an end-of-file, so it never stops.
You can replace stop; with input; for the same effect.
2. Why does executing either the input, checking _infile_ value, or output cause the overall program to exit?
Same reason: SAS has reached an end-of-file, so it stops then and there.
Consider this data step:
data ListOfFilePaths;
length PIPECMD $150 FILEPATH $500;
do I=4, 99999999999999, 6;
PIPECMD=cats("dir ""%sysfunc(pathname(work))"" | find """, I, """ ");
infile dummy pipe filevar=PIPECMD end=EOF;
do until(EOF); putlog 'before' I=;
input; putlog 'after' I=;
FILEPATH=strip(_infile_);
output;
end;
end;
stop;
run;
Value 6 is never processed and you can see in the log where the step stops (between 'before' and 'after').
Remove the unmatched value 99999999999 and 6 is processed.
Thanks, Chris. I'm not sure I fully understand still though. I hope you'll entertain a follow-up thought or two (not really a question, I realize-- more of just repeating back what you said).
Your explaination to my #2 makes sense to me. You're saying that once SAS actually hits an EOF for any reason, it ends the data step right then. By using a do while (not eof), SAS isn't actually hitting the eof on a statement, so it keeps going. That makes sense and is kind of vaguely what I suspected.
The distinction I'm not getting as clearly though is the two do loops not ending once it gets to EOF. I pasted a version below that has ONLY the do loops and an output, and this version ends fine without the stop. Same if it didn't even have the output statement at all. I think what's happening here is that after exiting the do loop, four records have been written, and then sas realizes it's at eof at the end of the data step, right?
I suspect the difference is that with the infile in the middle of the program, and explicitly checking for and preventing sas from getting the eof (using a do while (not eof)), sas never encounters the eof from anywhere. And, I guess I see that makes sense, but just confirming that we're in-sync.
data ListOfFilePaths;
length topOfFileTree $50 stringsToSearchFor $15;
do topOfFileTree = "/firstTopLevelDirectory", "/secondTopLevelDirectory";
do stringsToSearchFor = "string1", "string2";
output;
end;
end;
run;
If your new code, since you are not reading a table or a file, the data step does not iterate (and doesn't wait for the last record since there isn't any).
It stops when reaching the end of the data step, identified by the run; statement.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.