BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Junyong
Pyrite | Level 9

I saw this post—Using FILENAME ZIP to unzip and read data files in SAS—that unzips a file inside a ZIP file as follows.

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sas_tech_talks_15.xlsx) 
       lrecl=256 recfm=F length=length eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc import datafile=xl dbms=xlsx out=confirmed replace;
  sheet=confirmed;
run;

and I found that the following minimized code also works.

data _null_;
	infile zipfile(member) recfm=f;
	file target recfm=n;
	input;
	put _infile_;
run;

(1) I use an END= option sometimes, but don't use E:STOP; (with EOF=) and RETURN; in most cases. I found that the code above works as well without EOF, RETURN, STOP statements, but what are their functions here? And are they necessary to prevent potential errors?

(2) Also I hardly use RECFM=F and RECFM=N in other situations than unzipping, but I found that the code doesn't work without these RECFM statements. What are their exact functions in this context?

Thanks for your help.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Let's talk about the second issue first as it is the more useful.  Normally SAS will read and write text files as varying length lines. Where the end of the line is marked by an end of line character.  This normally a linefeed (aka LF aka '0A'x)  on Unix and a carriage return (aka CR aka '0D'x) and linefeed pair on Windows. (See the TERMSTR= option on the INFILE or FILE statement).

 

But an XLSX file is not a text file and does not have lines. You do not want to interpret bytes that contain  '0A'x as a linefeed or those that contain '0D'x as a carriage return.  By telling SAS to use RECFM=F you are telling it to instead read/write the file as records (that do not have any end of line or "separator" characters between them).  So if you ask it to read a file using RECFM=F and LRECL=256 it will read the first 256 bytes. then the next 256 bytes etc.  The RECFM=N is more general in that it does not read a fixed number of bytes but it also does not attempt to interpret anything as being end of line or record separators. Which to use probably depends on combination of what makes sense for you file (was it really created using fixed length records?) and which performs better.

 

What is the difference between the END= and EOF= options?  The END= option lets you name a variable that will set to true (1) when you have reached the end of file.  The EOF= option lets you give it the name of statement label where program control should jump when you reach the end of the file.  For most uses of the INFILE statement you don't need either. In that case what happens is when your INPUT statement tries to read past the end of the file the data step stops right there.  You probably don't need to use either for this application since you don't want to do anything other than stop when you hit the end of the file.

 

What is a RETURN statement?  It is like the end of a subroutine call. If there was no subroutine call then it will just end this iteration (sort of like returning to the top of the data step).  Having the return statement in that data step prevents you from running the STOP statement on every iteration of the data step. The only way to get to the STOP statement is to jump to the label EOF.  That is what the EOF=EOF option on the INFILE statement is telling SAS to do when the INPUT statement reads the end of the file.

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Let's talk about the second issue first as it is the more useful.  Normally SAS will read and write text files as varying length lines. Where the end of the line is marked by an end of line character.  This normally a linefeed (aka LF aka '0A'x)  on Unix and a carriage return (aka CR aka '0D'x) and linefeed pair on Windows. (See the TERMSTR= option on the INFILE or FILE statement).

 

But an XLSX file is not a text file and does not have lines. You do not want to interpret bytes that contain  '0A'x as a linefeed or those that contain '0D'x as a carriage return.  By telling SAS to use RECFM=F you are telling it to instead read/write the file as records (that do not have any end of line or "separator" characters between them).  So if you ask it to read a file using RECFM=F and LRECL=256 it will read the first 256 bytes. then the next 256 bytes etc.  The RECFM=N is more general in that it does not read a fixed number of bytes but it also does not attempt to interpret anything as being end of line or record separators. Which to use probably depends on combination of what makes sense for you file (was it really created using fixed length records?) and which performs better.

 

What is the difference between the END= and EOF= options?  The END= option lets you name a variable that will set to true (1) when you have reached the end of file.  The EOF= option lets you give it the name of statement label where program control should jump when you reach the end of the file.  For most uses of the INFILE statement you don't need either. In that case what happens is when your INPUT statement tries to read past the end of the file the data step stops right there.  You probably don't need to use either for this application since you don't want to do anything other than stop when you hit the end of the file.

 

What is a RETURN statement?  It is like the end of a subroutine call. If there was no subroutine call then it will just end this iteration (sort of like returning to the top of the data step).  Having the return statement in that data step prevents you from running the STOP statement on every iteration of the data step. The only way to get to the STOP statement is to jump to the label EOF.  That is what the EOF=EOF option on the INFILE statement is telling SAS to do when the INPUT statement reads the end of the file.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 653 views
  • 0 likes
  • 2 in conversation