02-23-2018 09:25 PM - edited 02-23-2018 09:27 PM
I'm using SAS to parse a Zip archive on a UNIX/AIX machine. Based on certain criteria, I select zero to many files from the zip archive to be unzipped. Some times the number of files can be up to 60,000. I then format the file names of the selected files into a SYSTASK command to unzip the file in UNIX and CALL EXECUTE the SYSTASK.
Everything works, and all the right files get unzipped in a timely manner, but any subsequent SAS steps run slower than molasses in January in Maine. I run four SAS data steps after the unzips. Normally the total run time for all four steps combined is less than five minutes. However, if I run the unzips first and then the four SAS steps in the same SAS batch job, the four steps drag on and on, taking 10 minutes or more each. if I break the job in two, then the four steps run in mere minutes total.
Presumably, I'm eating up a lot of memory with all my CALL EXECUTEs, and the lack of memory is causing the dramatic slowness. Is there a way I can release the memory after I'm done with the CALL EXECUTES? Each of the SYSTASKS is submitted with a task ID and a task RC, so I know when the unzipping of the files is complete.
I could just break the job in two, but the whole purpose of the job is to not have someone sitting there manually checking to see if all the unzips of the files have completed.
02-23-2018 09:31 PM - edited 02-23-2018 09:34 PM
If you think that CALL EXECUTE() is the problem then stop using it. Just write the code to a file and %INCLUDE the file.
filename code temp; data _null_; set zipfiles ; file code ; put 'systask ' ...... ; run; %include code / source2 ;
But I suspect that using a lot of SYSTASK calls is more likely to be the cause.
02-23-2018 10:15 PM
Ah. Good idea. Thank you for that. Worth a try at least although it could be that as you suspect the SYSTASKs are what is killing memory.
VAX/VMS used to have an UNLOAD option for SAS. No such luck with it in UNIX (not a valid option based on the message I got), and OPTION CLEANUP doesn't seem to help any.
Is there no way to release memory between steps?
02-23-2018 10:27 PM
Why don't you run your zip archive jobs in batch mode? Then it wont affect later EG session performance and you can also run the batch job split into as many sub-jobs as you like.
02-23-2018 10:47 PM
I may not be understanding your answer. I can submit my SAS code either via UNIX at the command prompt or via SAS EG. Either way, the performance is very poor if the unzips are launched from the same SAS job that I process the unzipped files in.
I could split the job, but I was hoping to have a single job that could be launched and have no manual intervention -- which I do except that it's extremely slow on those steps after the unzips.
02-23-2018 11:08 PM
@jimbarbour- What I was thinking is you could have several batch jobs, each running the same code but processing different batches of your zip archives and controlled by a job parameter to identify the zip batch to run.
02-24-2018 12:36 AM