Architecting, installing and maintaining your SAS environment

any libname/datastep optimizations that can be added?

Accepted Solution Solved
Reply
Contributor
Posts: 22
Accepted Solution

any libname/datastep optimizations that can be added?

 Hello, I recently made a post on optimizations for libname with procsql to dbms. For simple libnames/macros that use file share data are there similar optimizations? The sasv9.cfg file has several optimization flags built in. It seems that sas is using less than 10% cpu usage and < 100mb ram. My guess is it is reading 1 record at a time? The core logic are to load the tables then perform proc summarys. (set/keep/summary)

 

/* Enable additional logging, multithreading and SAS parallel processing */

OPTIONS MSGLEVEL=I;

OPTIONS THREADS = yes;

OPTIONS CPUCOUNT = ACTUAL;

OPTIONS dbsliceparm = all;

 

*OPTION OBS = 5000;

OPTION OBS = max;

 

PROC OPTIONS GROUP=performance;

proc options option=threads;

 

LIBNAME Source "L:\my folder\output" ACCESS = READONLY;

 %LET BaseFolder = L:\my other folder\v2;

LIBNAME Output "&BaseFolder.\Output";

LIBNAME Temp "&BaseFolder.\Temp (Delete When Done)";

 

 

similar thread:

https://communities.sas.com/t5/Base-SAS-Programming/Re-SAS-optimizations-with-MS-T-SQL-and-robust-sy...

 

sas9v.cfg:

-MEMSIZE 16G
-REALMEMSIZE 12G
-msymtabmax 1G
-MEXECSIZE 1G
-MVARSIZE = MAX
-SORTSIZE 8G
-sumsize 8G
-BUFSIZE=128k
-UBUFSIZE=128k
-IBUFSIZE=32767
-UBUFNO=20
-IBUFNO=20
-BUFNO=40
-CPUCOUNT = actual
-THREADS
-dbsliceparm=all
-ALIGNSASIOFILES


Accepted Solutions
Solution
‎01-18-2016 04:18 PM
Trusted Advisor
Posts: 3,211

Re: any libname/datastep optimizations that can be added?

You are obviously on a Windows system.

Optimizing performance is using the hardware resources (CPU IO/DASD memory) being in your system as optimal as possible.
For that knowing the bottlenecks wiht the processing is important.

1- How many dasd -drives (spinning) do you have? 

The IO is serialized at that point (OS/hardware limitation))  Do you have only one physical drive than putting more random load will increase the real time job proces. You want to decrease that.   In this case the SPD engine won't be helpfull it is putting more random load.


2- Do you have a SSD in your system?  Than put the most have IO related files there (SASWORK/Data)

 

3- What is your data-structure, do you have many  columns/variabels wiht a lot of spaces repeatable values.

Than test the processing with COMPRESS=binary (Ross compression on rows A simplified zip). Having a ratio over over 50% you can see gains in processing times (real time) and sometimes in execution time. That decrease of execution time has the only explanation that the system does have less work to process buffers to your storage.  With a SSD the trade off's are different as you are missing the rotation delays (typical 10 ms).

 

4/ How much memory do you have 8Gb 16Gb?  When the input data fits easily into memory eg 1gb data on a 12Gb system, than think on SASFILE usage.  http://support.sas.com/documentation/cdl/en/lestmtsref/68024/HTML/default/viewer.htm#n0osyhi338pfaan...

5/ Programming / coding solutions. Only the SAS datastep is reading one record at a time from IO buffers. Procs are doping different things.  Proc summary does have a class and by for the same logic. When all resulting data fits into memory than uses the class approach. It avoids any sorting needs. The by approach is able to process data beyond easy native processing but needs sorted data.
  
Going to use Proc SQL for reporting than you have a threaded approach several treads will do a part of the processing.
Remember the relational approach was designed for OLTP where the ordering of data in the storage is logical irrelevant. SQL coding is not using any assumed ordering. This differences to the datastep has advantages and disadvantages.     

Wiht multihreading all cpus in your system can be used at the same time.
Aside proc SQL thre are some other procs doing this. sort report summary means tabulate. Use one of those when possible

 http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#n0czb9vxe72693n1lom0...

 

---->-- ja karman --<-----

View solution in original post


All Replies
Contributor
Posts: 22

Re: any libname/datastep optimizations that can be added?

my thought was to enable compression since 90% of the cpu is unused?
Super User
Posts: 5,424

Re: any libname/datastep optimizations that can be added?

The Base engine is single threaded. Use the SPDE engine instead if you think you can peed up the I/O (low CPU can point in that direction).

 

SAS reads a single record at the time - not really, but conceptually yes - if we are talking about data step processing.

SQL and many PROCs uses multi threading, and keeps lots of records in the memory, if necessary.

 

But I share your experience that it's hard to get a single SAS session to utilize CPU, many steps just do a little processing, more data movement.

 

Compression drives CPU, yes, but the execution time may not always benefit from this. If you have long records, with variable length character data - a good candidate. Do some testing on your data, and try to figure out a strategy that suits your environment/data.

Data never sleeps
Contributor
Posts: 22

Re: any libname/datastep optimizations that can be added?

spde seems familiar, any way to "Enable it" like a sasv9cfg or options? I did some light reading but not sure how I could leverage it to tell SAS to multithread the code.

 

https://support.sas.com/documentation/cdl/en/engspde/67961/PDF/default/engspde.pdf

 

do you just add spde into your libname statements?

 

LIBNAME Source spde  "L:\my folder\output" ACCESS = READONLY;

LIBNAME Output   spde "&BaseFolder.\Output";

LIBNAME Temp spde  "&BaseFolder.\Temp (Delete When Done)";

Contributor
Posts: 22

Re: any libname/datastep optimizations that can be added?

I added SPDE which worked for some sections and not others darn.

Super User
Posts: 5,424

Re: any libname/datastep optimizations that can be added?

Yes, you define them using libname statements.
Be aware that you need reload the data when changing engine, since the physical storage format is different.

What do you mean by that it didn't work for some sections?
Data never sleeps
Solution
‎01-18-2016 04:18 PM
Trusted Advisor
Posts: 3,211

Re: any libname/datastep optimizations that can be added?

You are obviously on a Windows system.

Optimizing performance is using the hardware resources (CPU IO/DASD memory) being in your system as optimal as possible.
For that knowing the bottlenecks wiht the processing is important.

1- How many dasd -drives (spinning) do you have? 

The IO is serialized at that point (OS/hardware limitation))  Do you have only one physical drive than putting more random load will increase the real time job proces. You want to decrease that.   In this case the SPD engine won't be helpfull it is putting more random load.


2- Do you have a SSD in your system?  Than put the most have IO related files there (SASWORK/Data)

 

3- What is your data-structure, do you have many  columns/variabels wiht a lot of spaces repeatable values.

Than test the processing with COMPRESS=binary (Ross compression on rows A simplified zip). Having a ratio over over 50% you can see gains in processing times (real time) and sometimes in execution time. That decrease of execution time has the only explanation that the system does have less work to process buffers to your storage.  With a SSD the trade off's are different as you are missing the rotation delays (typical 10 ms).

 

4/ How much memory do you have 8Gb 16Gb?  When the input data fits easily into memory eg 1gb data on a 12Gb system, than think on SASFILE usage.  http://support.sas.com/documentation/cdl/en/lestmtsref/68024/HTML/default/viewer.htm#n0osyhi338pfaan...

5/ Programming / coding solutions. Only the SAS datastep is reading one record at a time from IO buffers. Procs are doping different things.  Proc summary does have a class and by for the same logic. When all resulting data fits into memory than uses the class approach. It avoids any sorting needs. The by approach is able to process data beyond easy native processing but needs sorted data.
  
Going to use Proc SQL for reporting than you have a threaded approach several treads will do a part of the processing.
Remember the relational approach was designed for OLTP where the ordering of data in the storage is logical irrelevant. SQL coding is not using any assumed ordering. This differences to the datastep has advantages and disadvantages.     

Wiht multihreading all cpus in your system can be used at the same time.
Aside proc SQL thre are some other procs doing this. sort report summary means tabulate. Use one of those when possible

 http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#n0czb9vxe72693n1lom0...

 

---->-- ja karman --<-----
Super User
Posts: 10,018

Re: any libname/datastep optimizations that can be added?

proc dbload could be faster . But that is very old PROC if you could still use it .

Super User
Posts: 5,424

Re: any libname/datastep optimizations that can be added?

Makes no sense for SAS engines. Does it even work?
Data never sleeps
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 615 views
  • 1 like
  • 4 in conversation