@ChrisNZ ,
Just to make things more interesting, I modified your command slightly as follows:
cls | dir "..\..\*.*" /s | find " "
This may not have been the world's smartest thing to do -- my drive is a 12 Tb drive. The output scrolled by for what seemed like an eternity, but, in the end, it completed without issue. At least with this particular means of testing, there doesn't seem to be an issue at the OS level.
I won't have time today, but perhaps tomorrow I can write up the issue and send it to SAS.
Jim
@Patrick ,
Thank you for that interesting bit of ODS code. I was aware of the Zip Libname engine in SAS, but I was not aware of the "package()" functionality in ODS. I may have to play with that. If it were faster than wzzip.exe, then perhaps it would be a good alternative.
Now, if I really wanted to make zipping an entire directory fast, what I would do is to switch from the X command to a SYSTASK with a NOWAIT parameter. Each file in the directory would be assigned a process name and a process specific return code macro variable, and then, after all SYSTASKs had submitted their iterations of wzzip.exe, I would code a WAITFOR _ALL_. Once the wait period is over, all return codes would be interrogated (and handled appropriately). Since all files would be being zipped concurrently, this would be a very fast way to zip an entire directory. It would take a bit of work to do it, but boy it would be fast.
I've got most of the code written in other programs, I'd just have to assemble the thing. Of course parallel processing can be very tricky to debug, but, once debugged, the speed advantage would be truly outstanding as compared to serial processing.
Jim
>At least with this particular means of testing, there doesn't seem to be an issue at the OS level.
I'd replace the dir command with the zip command to compare apples to apples.
>Since all files would be being zipped concurrently, this would be a very fast way to zip an entire directory
Random I/Os scattered over many files may actually slow down the zipping, depending on whether the bottleneck is disk or CPU.
Or there might be a golden number of concurrent zips to get the balance right. Just saying.
@ChrisNZ ,
I ran the following command on a dataset that failed in my first version of my FILNAME PIPE code (before I redirected STDERR to a file.
"E:\Program Files\WinZip\wzzip.exe" -a "I:\Commercial\monthly_data\Cornerstone\archive\Stg\Rx_Archive_2020-03-05_0107.zip" "I:\Commercial\monthly_data\Cornerstone\rx.sas7bdat"2>&1 | find " "
and it worked just fine. The pipe caused no problems. Note that I redirected STDERR to STDOUT, so both STDERR and STDOUT should be piped to the find command.
Good point about not submitting too many wzzip.exe jobs at a time. I tend to stick with three at a time, although I'm not sure there's any real magic in three.
Jim
Well it looks like you have all the data for SAS to replicate this strange limitation, and create a defect or at least a UN.
The main reason I'd go for ODS package or the zip engine if there isn't a good reason to do otherwise is the removal of dependency on 3rd party tools. It also makes the code more OS independent (like for a migration from a Windows to a Linux server) and if there are any issues then I can contact a single vendor for support.
I guess your %systask() / rsubmit approach would also work for SAS code (bit more resource consumption though). I would in any case base the number of child tasks you create in parallel on available CPU - and then loop using waitfor _any_ to start new tasks once existing ones finish.
If you've got SAS DI Studio then the Loop Transformation generated SAS code (if execute in parallel ticked) does this using rsubmit.
BTW: Just found here a ready made SAS macro for zipping all SAS datasets in a folder.
@Patrick ,
Hmmm. That's a very good point about eliminating third party software. I had not considered that. I'll have to mull that one over.
That's an interesting zip the directory macro. I have something similar although I just pipe the results of a dir command into my program. I think piping the results in from a dir command is just a bit simpler than fileref, dopen, etc.
Here's the core of my macro (below). This macro then feeds each file name from the dir command to the macro that does the actual zipping.
%cd(&Data_Dir);
FILENAME Dir_List PIPE "dir *.sas* /b /os";
DATA _NULL_;
LENGTH File_Name $256;
LENGTH Command $256;
INFILE Dir_List;
INPUT File_Name;
&Cmnt PUTLOG "&Nte2 " _N_ = ' ' File_Name;
Command = CATS('%Archive_Data(', "&Data_Dir.,", File_Name,
",&Arch_Dir.&Zip_SubFolder, &Zip_File_Name, ",
"Debug=&Debug, Width=&Width, Xmin=&Xmin)");
&Cmnt PUTLOG "&Nte2 " Command=;
IF UPCASE("&Run_Mode") = 'PROD' THEN
DO;
CALL EXECUTE(Command);
END;
RUN;
%Check_RC(MsgLvl=&MsgLvl, ErrLvl=&ErrLvl);
%Reset_RC;
Jim
Why not just tell WinZip not to write that gibberish?
What options does it have? Most of those tools have quiet option. If you must see what is in the file use a second command to ask for the file contents.
SAS does not have a problem reading long lines, just use RECFM=N and treat it as a stream of bytes. But perhaps there is some interaction with PIPE and long lines?
@Tom ,
I originally looked for a "quiet" option but was unable to find one for Wzzip.exe. In my last shop, the zip software that we used there had a -qq option that suppressed just about all messages which was really useful for batch mode. Wzzip.exe does have a -ybc option which basically specifies that any prompts should be handled as an "OK" or "Yes", but there's no "quiet" option that I've been able to find.
That's an interesting idea to try to handle the STDERR as a stream of bytes. Unfortunately, SAS redirects the output from STDERR directly to the log when a PIPE is used, and one cannot specify anything like a RECFM.
Having the gibberish written to a text file isn't a big deal and seems to be the best way to deal with the situation. I think I'll stick with that unless I can find a "quiet" mode.
Jim
Below some sample code which uses ODS Package for zipping.
Source file test is around 12GB.
ODS package requires less than 2 minutes on my not very powerful server (9.04.01M5P091317 under RHEL).
data test(drop=_: compress=no);
length var $ 1024;
var=repeat(' A BB C ',100);
_n_row=1024 * 1024 * 12;
do _i=1 to _n_row;
output;
end;
stop;
run;
%put Start: %sysfunc(datetime(), datetime21.);
/* Creating a ZIP file with ODS PACKAGE */
ods package(newzip) open nopf;
ods package(newzip) add
file="%sysfunc(pathname(work))/test.sas7bdat"
path="data/";
ods package(newzip) publish archive
properties(
archive_name="test2.zip"
archive_path="~"
);
ods package(newzip) close;
%put Stop: %sysfunc(datetime(), datetime21.);
proc datasets lib=work nolist nowarn;
delete test;
run;
quit;
Update: And I've run above now also for a 15GB file. Zipping took in my environment 2min 8 seconds.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.