Hi All,
I'm reading a list of text files, and would like a way to identify whether a record I am reading is the first record of a file or not, and whether it is the last record of a file or not. I read the options for the infile statement, but can't seem to get what I want.
Sample have three files like:
file1.txt 1 2 3 file2.txt 40 50 60 70 file3.txt 800 900 1000
WANT data like:
id first last 1 1 0 2 0 0 3 0 1 40 1 0 50 0 0 60 0 0 70 0 1 800 1 0 900 0 0 1000 0 1
I can get FIRST by comparing each filename to the lag of filename:
filename myfiles("d:\junk\file1.txt" "d:\junk\file2.txt" "d:\junk\file3.txt" );
data want;
length filename $20;
infile myfiles filename=filename;
input id;
first=filename ne lag(filename);
* last= ??? ;
run;
But how can I get LAST? Do I need to do some sort of lookahead?
Big picture, before I start reading from a new file, I want to do some setup stuff. After I have read the last record from a file I want to do some post processing stuff.
@Quentin wrote:
Thanks @data_null__,
That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file. Should be cleaner than all the RETAINING I have been doing.
Something like:
data want; length myfile $ 300; input myfile $ ; infile dummy filevar=myfile end=done; put "Pre-processing " myfile=; do while(not done); input id ; put (id myfile)(=); output; end; put "Post-processing " myfile=; datalines; d:\junk\file1.txt d:\junk\file2.txt d:\junk\file3.txt ;
Yes a lot less fiddly. 🙂 You could even use an INFILE DUMMY PIPE FILEVAR='command to return file names like DIR' in place of the names you are reading from CARDS. Perhaps more useful if you have many file to read.
Thanks @data_null__,
That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file. Should be cleaner than all the RETAINING I have been doing.
Something like:
data want;
length myfile $ 300;
input myfile $ ;
infile dummy filevar=myfile end=done;
put "Pre-processing " myfile=;
do while(not done);
input id ;
put (id myfile)(=);
output;
end;
put "Post-processing " myfile=;
datalines;
d:\junk\file1.txt
d:\junk\file2.txt
d:\junk\file3.txt
;
@Quentin wrote:
Thanks @data_null__,
That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file. Should be cleaner than all the RETAINING I have been doing.
Something like:
data want; length myfile $ 300; input myfile $ ; infile dummy filevar=myfile end=done; put "Pre-processing " myfile=; do while(not done); input id ; put (id myfile)(=); output; end; put "Post-processing " myfile=; datalines; d:\junk\file1.txt d:\junk\file2.txt d:\junk\file3.txt ;
Yes a lot less fiddly. 🙂 You could even use an INFILE DUMMY PIPE FILEVAR='command to return file names like DIR' in place of the names you are reading from CARDS. Perhaps more useful if you have many file to read.
Although @data_null__'s reply is an elegant use of the FILEVAR and EOF options, there is a relatively simple way to look-ahead for an incoming file change.
Just add a second FILENAME statement with the same input as the primary filename statement. Then, in the data step, use it in a 2nd INFILE statement with a FIRSTOBS=2 option, and followed by a dummy INPUT statement:
filename myfiles ('c:\temp\file1.txt','c:\temp\file2.txt','c:\temp\file3.txt');
filename myfiles2 (myfiles);
data want ;
length _fvar _fvar2 $24;
infile myfiles filename=_fvar end=_end;
input x ;
infile myfiles2 filename=_fvar2 firstobs=2;
if _end=0 then input ;
else _fvar2=' ';
begin=(_fvar^=lag(_fvar));
end=(_fvar^=_fvar2);
run;
Thanks @mkeintz.
I was thinking there must be an infile option that does what I had hoped. That not being there, your approach is what I was trying to come up with. Looks like the same as one of the "standard" look-ahead approaches for SAS datasets, I just failed to make it work with infile.
Sadly I missed your leads and lags talk at BASUG earlier this year. If I had seen I'd, I'm sure I would have learned this and more. : )
-Q.
@Quentin: I'm giving the Lags and Leads talk again at SESUG next month. If you're coming please find me.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.