BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

Hi All,

 

I'm  reading a list of text files, and would like a way to identify whether a record I am reading is the first record of a file or not, and whether it is the last record of a file or not.  I read the options for the infile statement, but can't seem to get what I want.

 

Sample have three files like:

file1.txt
1
2
3

file2.txt
40
50
60
70

file3.txt
800
900
1000

WANT data like:

 

  id    first   last

   1      1       0
   2      0       0
   3      0       1
  40      1       0
  50      0       0
  60      0       0
  70      0       1
 800      1       0
 900      0       0
1000      0       1

 

 

I can get FIRST by comparing each filename to the lag of filename:

filename myfiles("d:\junk\file1.txt" "d:\junk\file2.txt" "d:\junk\file3.txt" );


data want; length filename $20; infile myfiles filename=filename; input id; first=filename ne lag(filename); * last= ??? ; run;

 

 

But how can I get LAST?  Do I need to do some sort of lookahead?

 

Big picture, before I start reading from a new file, I want to do some setup stuff.  After I have read the last record from a file I want to do some post processing stuff.

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
1 ACCEPTED SOLUTION

Accepted Solutions
data_null__
Jade | Level 19

@Quentin wrote:

Thanks @data_null__,

 

That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file.  Should be cleaner than all the RETAINING I have been doing.

 

Something like:

data want;
   length myfile $ 300;
   input myfile $ ;
   infile dummy filevar=myfile end=done; 

   put "Pre-processing " myfile=; 
   do while(not done);
     input id ;
     put (id myfile)(=);
     output;
   end;
   put "Post-processing " myfile=;

   datalines;
d:\junk\file1.txt
d:\junk\file2.txt
d:\junk\file3.txt
;

Yes a lot less fiddly. 🙂   You could even use an INFILE DUMMY PIPE FILEVAR='command to return file names like DIR' in place of the names you are reading from CARDS.   Perhaps more useful if you have many file to read.

View solution in original post

7 REPLIES 7
data_null__
Jade | Level 19
Consider using INFILE statement option FILEVAR where you can use END to detect EOF.
Quentin
Super User

Thanks @data_null__,

 

That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file.  Should be cleaner than all the RETAINING I have been doing.

 

Something like:

data want;
   length myfile $ 300;
   input myfile $ ;
   infile dummy filevar=myfile end=done; 

   put "Pre-processing " myfile=; 
   do while(not done);
     input id ;
     put (id myfile)(=);
     output;
   end;
   put "Post-processing " myfile=;

   datalines;
d:\junk\file1.txt
d:\junk\file2.txt
d:\junk\file3.txt
;
BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
data_null__
Jade | Level 19

@Quentin wrote:

Thanks @data_null__,

 

That gives it a nice DoW loop structure to allow preprocessing and postprocessing of each file.  Should be cleaner than all the RETAINING I have been doing.

 

Something like:

data want;
   length myfile $ 300;
   input myfile $ ;
   infile dummy filevar=myfile end=done; 

   put "Pre-processing " myfile=; 
   do while(not done);
     input id ;
     put (id myfile)(=);
     output;
   end;
   put "Post-processing " myfile=;

   datalines;
d:\junk\file1.txt
d:\junk\file2.txt
d:\junk\file3.txt
;

Yes a lot less fiddly. 🙂   You could even use an INFILE DUMMY PIPE FILEVAR='command to return file names like DIR' in place of the names you are reading from CARDS.   Perhaps more useful if you have many file to read.

mkeintz
PROC Star

Although @data_null__'s reply is an elegant use of the FILEVAR and EOF options, there is a relatively simple way to look-ahead for an incoming file change.

 

Just add a second FILENAME statement with the same input as the primary filename statement.  Then, in the data step, use it in a 2nd INFILE statement with a FIRSTOBS=2 option, and followed by a dummy INPUT statement:

 

filename myfiles ('c:\temp\file1.txt','c:\temp\file2.txt','c:\temp\file3.txt');
filename myfiles2 (myfiles);

data want ;
  length _fvar _fvar2 $24;
  infile myfiles filename=_fvar end=_end;
  input x  ;

  infile myfiles2 filename=_fvar2 firstobs=2;
  if _end=0 then input ;
  else _fvar2=' ';

  begin=(_fvar^=lag(_fvar));
  end=(_fvar^=_fvar2);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quentin
Super User

Thanks @mkeintz

 

I was thinking there must be an infile option that does what I had hoped.  That not being there, your approach is what I was trying to come up with.  Looks like the same as one of the  "standard" look-ahead approaches for SAS datasets, I just failed to make it work with infile.

 

Sadly I missed your leads and lags talk at BASUG earlier this year.  If I had seen I'd, I'm sure I would have learned this and more.  : )

 

-Q.

BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
mkeintz
PROC Star

@Quentin:  I'm giving the Lags and Leads talk again at SESUG next month.  If you're coming please find me.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Quentin
Super User
@mkeintz Excellent! I'll be there, and will definitely track you down.
BASUG is hosting free webinars Next up: Mike Sale presenting Data Warehousing with SAS April 10 at noon ET. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2331 views
  • 7 likes
  • 3 in conversation