Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MelKunkel
Calcite | Level 5

Multi-core/multi-threads processing.

I have SAS 9.2/Enterprise 4.3 running on a windows (7 x64) machine with eight cores (+ Tesla C2075 GPU) and cannot get my program to run on more than one core at a time.  The code runs nicely and produces great results, but is very slow. I have tried adding options for threads and cpucount and the log shows the cores, but they are not used. The code is below; any thoughts on getting it to run on more than one core would be greatly appreciated.

Thanks,

Mel

proc surveyselect DATA=data

     noprint                                                        

     seed=1                                                         

     out=boot1

     method=srs

     samprate=.45                                              

     rep=250000;    

run;

Proc reg data=boot1 outest=est1(drop=_:)noprint;

     MODEL Total = x1 x2 …… xn

/ SELECTION= adjrsq cp start=7 stop=7 best=1;

     by replicate;

run;

Proc means data=est1

     N;

     VAR x1 x2 …… xn;

     output out=test1

     N= ;

run;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
jakarman
Barite | Level 11

There are several procedures within SAS that are Multi-threaded.

The design of threading has started with sas 9.  It is about the TK-kernal. Seen a lot of that in 9.1.3. Still growing in 9.3 and 9.4.

SAS(R) 9.4 Language Reference: Concepts (Threading in Base SAS)

http://www2.sas.com/proceedings/sugi29/217-29.pdf

Some are additional licensed

SAS/STAT(R) 12.3 User's Guide: High-Performance Procedures

Some are more sophistaced to think about like IML 

Multithreaded = more productive - The DO Loop

You have the mp-connect option within Eminer. You have the Grid approach as a service

The mentioned procedures are reprogrammed in multithreading approach by SAS.

You could do this by yourself in Assembler C or R.

The problem is that doing this your data-analyses the algorithme approach must be suited to do that. That is a human factor.

---->-- ja karman --<-----

View solution in original post

14 REPLIES 14
art297
Opal | Level 21

Just out of curiosity, do you have a license for a machine with more than one core?  While I'm not overly familiar with the latest restrictions SAS has imposed on multicore processors, having only a single core license could be the reason you can't take full advantage of the option.

MelKunkel
Calcite | Level 5

I have single user license; but based upon what I have read on the website, I should be able to apply a program across cores within the same machine.  But a good thought as I have tried everything else I could think of.

Thanks,

Mel

AhmedAl_Attar
Ammonite | Level 13

Submit the following from your EG 4.3

Proc Options group=performance; run;

You need to see THREADS in the SAS Log. if that didn't work, then I would try adding the threading option explicitly on each procedure.

Note: Performance is not always related to CPU, memory and I/O are contributing factors to. If you have the memory, then make sure, you have -SORTSIZE set to 512MB and -MEMSIZE set to a minimum of 1024MB.

Hope this helps,

Ahmed

MelKunkel
Calcite | Level 5

Thanks Ahmed,

I have tried what you suggested with no luck. I am beginning to believe that it is a limitation within the Proc Reg procedure and its ability to parallel the selection process.

Thanks,

Mel

art297
Opal | Level 21

According to the following paper:

SAS Programming for Data Mining Applications: Parallel Processing with Single-Thread SAS license

an additional license is necessary, but they offer an alternative that you might be able to take advantage of.

SASKiwi
PROC Star

I am assuming that you are running SAS on a Win 7 workstation (not server). It's my understanding that a SAS workstation licence is on a per-machine basis and there is no restriction on the number of cores (server licences are usually by number of cores). Your latest post indicates your issue is more with PROC REG. So are you seeing multi-core/thread use with other SAS procedures? What about the DATA step? It may be worth your while opening a tech support track to get official confirmation of what you are seeing.

GenDemo
Quartz | Level 8

Not sure if this will shed some light, but I got this url from sas support : Scalable SAS Procedures

I have a similar problem. I am using proc IML and OPTMODEL to do an optimisation technique. And I noticed on my CPU monitor that SAS only uses one core. I obtain very good results, but it takes an agonizing amount of time to run. But there does not seem to be a solution. They also suggested to add the following to my program:

     proc options option=threads;

     run;

ChrisMeaney
Calcite | Level 5

I encountered a similar problem recently with a large simulation. Basically want to do a whole bunch of independent computations on each replicate simulation run. Kind of like your computations, on each replicate sampled dataset. Consider the SAS code in the following URL:

SAS/CONNECT - Tips and Tricks

/** This tells SAS you want to do some parallel processing, so use your multiple cores **/

options sascmd="sas";

/** Break the job into multiple tasks to send to each core...you kind of have to balance the load yourself **/

/** It's ugly but I put the surveyselect and reg procedures in each "task chunk". Not sure how to pass your local dataset to each core otherwise **/

/** But since the replicate samples are independent anyways this shouldn't matter much **/

/** Start task chunk 1 **/

signon task1;

%syslput remvar1=somevalue;

rsubmit task1 wait=no log="task1.log" output="task1.lst";

proc surveyselect out=chunk1 ;

run ;

proc reg data=chunk1 ... ;

by rep ;

run ;

endrsubmit;

/** Start task chunk 2 **/

signon task2;

%syslput remvar2=somevalue;

rsubmit task2 wait=no log="task2.log" output="task2.lst";

proc surveyselect out=chunk2 ;

run ;

proc reg data=chunk2 ;

by rep ;

run ;

endrsubmit;

/** Start task chunk n **/

signon taskn;

%syslput remvarn=somevalue;

rsubmit taskn wait=no log="taskn.log" output="taskn.lst";

proc surveyselect out=chunk_n ;

run ;

proc reg data=chunk_n ;

by rep ;

run ;

endrsubmit;

/** Tell SAS not to do anything until all of the parallel tasks have completed **/

waitfor _all_ task1 task2 ... taskn;

/* do some further local processing */

/** Sign out of the parallel SAS sessions **/

signoff task1;

signoff task2;

signoff taskn;

The survey select samples are independent of each other. So you want to feed these independent tasks to each of your cores. Put proc surveyselect and proc proc reg in between the rsubmit and endrsubmit statements and they will run on different cores (assuming you wait=no argument).

I had trouble recovering the data from each core. They go to temporary work libraries and are lost when you signoff the core. You can see they live (temporarily) somewhere like here:

C:\Users\AppData\Local\Temp\SAS Temporary Files

It's kind of cool to see them created and then disappear during each spawned/parallel SAS session.

Some people suggest saving the libname directory in a macro variable. Didn't work for me. The macro variable stored was not in fact the name of the temporary file location created by the parallel session. Not sure why? My workaround was to just put a libname statement in each task chunk, and send the work done at each core to some permanent file location where you know. Then reference this libname in your local version of SAS to recover the work done in each chunk on each core.

That's about it...in my opinion it is quite annoying. SAS makes the user do all the work of passing datasets and macro variables, etc. to the separate cores. The user must balance load on their own. Control memory usage. Etc. It might be easier with the %Distribute macro. However, I could not get it to work?

For your problem I think it would be much more feasible with sample(), lm(), foreach(), doParallel() and snow()/parallel() in R. Way nicer features than SAS and muchl better documentation.

Reeza
Super User

Would R be able to handle data that big or would you be required to also split it up? I'm pretty sure you'd need to split it up as well.

Reeza
Super User

I think this may be one of those cases that goes against the case of "DON'T BE LOOPY" mantra. Because your data set gets so big and you're working on a HD the read/write time is high. 

If you create a macro to loop through each replicate instead it could be faster.  I had to do this for several simulations but it works. I'd also turn of my output listing, assuming you haven't already to prevent the output from going to HMTL or LISTING to help speed things up.

jakarman
Barite | Level 11

There are several procedures within SAS that are Multi-threaded.

The design of threading has started with sas 9.  It is about the TK-kernal. Seen a lot of that in 9.1.3. Still growing in 9.3 and 9.4.

SAS(R) 9.4 Language Reference: Concepts (Threading in Base SAS)

http://www2.sas.com/proceedings/sugi29/217-29.pdf

Some are additional licensed

SAS/STAT(R) 12.3 User's Guide: High-Performance Procedures

Some are more sophistaced to think about like IML 

Multithreaded = more productive - The DO Loop

You have the mp-connect option within Eminer. You have the Grid approach as a service

The mentioned procedures are reprogrammed in multithreading approach by SAS.

You could do this by yourself in Assembler C or R.

The problem is that doing this your data-analyses the algorithme approach must be suited to do that. That is a human factor.

---->-- ja karman --<-----
ChrisMeaney
Calcite | Level 5

Hi @Reeza. I would agree with pretty much all your comments, looping may be more efficient as you aren't forced to read/write massively large files. You can read/write much smaller files (looped over some arbitrarily chosen number...again, it doesn't matter since all these tasks are independent of each other). Also agree to turn off output/log files printing. I found that can really slow things down. As can writing these to file. So, just printing them to nowhere speeds things up.

In terms of big files in R. I think depends on the machine you are running. I use R64 and have 16GB of RAM. I can store some pretty large objects to memory. Also the data.table package can be an efficient way to work with some bi files. I guess it depends on the exact task. But I think the old mantra that R is best suited for small datasets might be outdated too. 

jakarman
Barite | Level 11

@chris, Instead of having a fixed size of your "big data" concept, I woudl stick the definition as being used in the hype.

Big data - Wikipedia, the free encyclopedia  It is a elastic definition being used.

By that if you can dot it easliy on you own relative small machine it is "small data"

Why I am classifying you machine as small? The single machines/servers common commercial being used for this purpose are of a size of up to 1Tb ram with 64 cores. Even more is possible. Than it is build up clusters with many of them.  Data sizing being using is Peta scale  commonly.  Not even the exceptional real big environments.     

---->-- ja karman --<-----
Rick_SAS
SAS Super FREQ

In adition to the advice you've already gotten, consider whether you can use PROC GLMSELECT instead of PROC REG. 

PROC GLMSELECT supports the 

PERFORMANCE THREADS;

statement for parallel computation of BY-groups. The doc explains factors that contribute to enhanced or decreasesd performance. The PROC GLMSELECT is state-of-the-art and custom made for variable selection analysis.

 

I might be wrong, but I don't think PROC REG had parallel BY-group analysis in SAS 9.2.  It did have multithreaded formation of the SSCP matrix, but since you have 250,000 BY groups, you probably want to distribute that computation as well.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 29260 views
  • 2 likes
  • 9 in conversation