BookmarkSubscribeRSS Feed
jimbarbour
Meteorite | Level 14

I'm using SAS to parse a Zip archive on a UNIX/AIX machine.  Based on certain criteria, I select zero to many files from the zip archive to be unzipped.  Some times the number of files can be up to 60,000.  I then format the file names of the selected files into a SYSTASK command to unzip the file in UNIX and CALL EXECUTE the SYSTASK.

 

Everything works, and all the right files get unzipped in a timely manner, but any subsequent SAS steps run slower than molasses in January in Maine.  I run four SAS data steps after the unzips.  Normally the total run time for all four steps combined is less than five minutes.  However, if I run the unzips first and then the four SAS steps in the same SAS batch job, the four steps drag on and on, taking 10 minutes or more each.  if I break the job in two, then the four steps run in mere minutes total.  

 

Presumably, I'm eating up a lot of memory with all my CALL EXECUTEs, and the lack of memory is causing the dramatic slowness.  Is there a way I can release the memory after I'm done with the CALL EXECUTES?  Each of the SYSTASKS is submitted with a task ID and a task RC, so I know when the unzipping of the files is complete.

 

I could just break the job in two, but the whole purpose of the job is to not have someone sitting there manually checking to see if all the unzips of the files have completed.

 

Jim

6 REPLIES 6
Tom
Super User Tom
Super User

If you think that CALL EXECUTE() is the problem then stop using it.  Just write the code to a file and %INCLUDE the file.

filename code temp;
data _null_;
  set zipfiles ;
  file code ;
  put 'systask ' ...... ;
run;
%include code / source2 ;

But I suspect that using a lot of SYSTASK calls is more likely to be the cause.

 

jimbarbour
Meteorite | Level 14

Ah.  Good idea.  Thank you for that.  Worth a try at least although it could be that as you suspect the SYSTASKs are what is killing memory.

 

VAX/VMS used to have an UNLOAD option for SAS. No such luck with it in UNIX (not a valid option based on the message I got), and OPTION CLEANUP doesn't seem to help any.

 

Is there no way to release memory between steps?

 

SASKiwi
PROC Star

Why don't you run your zip archive jobs in batch mode? Then it wont affect later EG session performance and you can also run the batch job split into as many sub-jobs as you like.

jimbarbour
Meteorite | Level 14

@SASKiwi,

 

I may not be understanding your answer.  I can submit my SAS code either via UNIX at the command prompt or via SAS EG.  Either way, the performance is very poor if the unzips are launched from the same SAS job that I process the unzipped files in.

 

I could split the job, but I was hoping to have a single job that could be launched and have no manual intervention -- which I do except that it's extremely slow on those steps after the unzips.

SASKiwi
PROC Star

@jimbarbour- What I was thinking is you could have several batch jobs, each running the same code but processing different batches of your zip archives and controlled by a job parameter to identify the zip batch to run.

jimbarbour
Meteorite | Level 14
OK, so I think I've got a workable idea here:
After my unzips complete, instead of running additional job steps, I submit one more SYSTASK. This SYSTASK initiates a SAS job -- a job that contains the aforementioned four data steps. The four data steps execute in a separate memory space and will complete in a few minutes. The main job will in the meantime have been in a wait state. When the four data steps in the "subordinate" task complete, the main job resumes execution and then simply writes the log file from the subordinate thread into the log of the main thread so as to have all log information in a single location (in my case, typically Enterprise Guide). At end of the main job, I can send out an email notification or what have you -- all steps in all threads having been tracked by the main thread.

A bit of a kludge, but I am certain that it will work. It would be a lot easier if I could just clear the memory.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 983 views
  • 0 likes
  • 3 in conversation