I'm trying to unpack a ZIP file using FILENAME ZIP, similar to what was done in this SAS Blog post. Ideally, I would like to handle the filerefs and the copy automatically inside a single data step using the FILENAME function (not statement) and FCOPY, but this gives problems with larger file sizes.
As a reproducible example - note that I'm using a SAS dataset but the file type could be anything (the goal is to do a binary copy):
%let workdir = /path/to/workdir;
ods package open nopf;
ods package add file=".../path/to/sashelp/plfips.sas7bdat";
ods package publish archive properties(archive_path="&workdir" archive_name="data.zip");
ods package close;
options lrecl=max;
/* Does not work */
data _null_;
length S $1024 FI FO $8 RC 8;
rc = filename(FI, "&workdir./data.zip", "zip", "recfm=n member='plfips.sas7bdat'");
rc = filename(FO, "&workdir./plfips.sas7bdat", "disk", "recfm=n");
rc = fcopy(FI, FO);
if rc ne 0 then do;
s = sysmsg();
put s;
end;
rc = filename(FI);
rc = filename(FO);
run;
WARNING: 0 records were truncated when the FCOPY function read from fileref #0000007.
769 records were truncated when the FCOPY function wrote to fileref #LN00536.
To prevent the truncation of records in future operations, you can increase the amount of space
needed to accommodate the records by using the LRECL= system option or the LRECL= option in the FILENAME statement.
The resulting file is corrupted. SAS tells me to increase LRECL=, but this is already maximum and shouldn't come into play with RECFM=N.
Strangely enough, the following (more in line with the blog post) does work:
/* Does work */
filename src ZIP "&workdir./data.zip" recfm=n;
filename tar DISK "&workdir./plfips.sas7bdat" recfm=n;
data _null_;
infile src(plfips.sas7bdat);
file tar;
input;
put _infile_;
run;
filename tar clear; /* flush the write */
However, apart from the use of FILENAME statements instead of functions I don't see the difference. Is there any way to make the first approach work correctly?
I usually just use
recfm=f lrecl=512
when reading/writing/copying binary files.
@Tom wrote:I usually just use
recfm=f lrecl=512
when reading/writing/copying binary files.
Thanks! At first glance this seems to work in the FILENAME function as well, at least no more truncation warnings. It does appear that the file sometime gets padded though depending on the LRECL setting, and I see the following in the unix documentation (not an issue on Windows but we use both):
Do not use RECFM=F for external files that contain carriage-control characters.
Would this not cause problems if the file being copied is an actual arbitrary binary stream? It was my impression that RECFM=N should make that a non-issue.
I use code like this in a loop:
/* Assign a fileref with the ZIP method */
filename inzip ZIP "&sourcedir/&&Name..zip";
/* Read Members from the zip file*/
data contents_tmp(keep=memname);
length memname $200;
fid=dopen("inzip");
if fid=0 then
stop;
memcount=dnum(fid);
do i=1 to memcount;
memname=dread(fid,i);
output;
end;
rc=dclose(fid);
run;
/* Append for each file to contents table */
PROC APPEND BASE=CONTENTS DATA=contents_tmp FORCE;
RUN;
data xmlfiles_tmp(keep=File);
length file $255.;
File = "&sourcedir/&&Name..xml";
run;
PROC APPEND BASE=XMLFILES DATA=xmlfiles_tmp FORCE;
RUN;
filename XML "&sourcedir/&&Name..xml";
data _null_;
infile inzip(project.xml)
encoding = "UTF-16LE"
lrecl=2000
recfm=F
length=len
eof=eof unbuf
;
file XML lrecl=2000 recfm=N;
input;
put _infile_ $varying2000. len;
return;
eof:
stop;
run;
Are you able to post your example?
We use these macros for zipping / unzipping large folders with no issues:
https://core.sasjs.io/mp__zip_8sas.html
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.