BookmarkSubscribeRSS Feed
RobWartenhorst
Calcite | Level 5

I have been trying to reduce the time we need to wait for reading datasets from the repository in SDD 3.5. When looking at the behaviour of accessing the files I have noticed the following:

1. When you define a macro variable as input folder, then every time the code (or input process) with that macro variable is executed the datasets selected for the input folder are copied from the repository, even if no libname is set up using the macro variable.

2. When you define a libname, a folder in the workspace is set up with a unique name and the datasets defined in the macro variable for the input folder are copied.

3. When you redefine a libname, a new folder in the workspace is set up with another unique name and datasets are copied again.

So when you have a setup file that defines the libnames and if this setup file is read in every time a program is run, then for every program run datasets are copied over from the repository into a unique folder in the workspace. This a) consumes a lot of bandwith from the network and b) consumes a lot of disk space on the server.

Further I a have noticed that if you set up a libname in one process editor session, that this libname can be used by any other process editor session. My latest approach in reducing time to read in datasets is the following:

1. Set up a program that only assigns input libnames. Run this program and keep it active (but minimized)

2. All other programs that need to acess the input datasets do so using the libnames defined in the program from 1). Data is available immediately.

3. When setting up a job that runs multiple programs, define program from 1) as the initialization job after which the input libraries are available to all programs in the job.

While this process dramatically reduces run time when working with large datasets that are used by multiple programs it is a little cumbersome (accidentally closing the program that sets up the libnames requires a rerun of the program to make the libraries available again, need to remember to run this program first). I was wondering if other users have come up with clever ways to assign libraries with minimal copying from the repository. Note that I have noticed that just having the macro variable for the input folder being present already starts the copy process. So by checking if the libname exists and including an input process with libnames based on that does not make much of differences in performance.

Thanks,

Rob

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 0 replies
  • 934 views
  • 0 likes
  • 1 in conversation