BookmarkSubscribeRSS Feed
barb3126
Calcite | Level 5

I am trying to run SAS on a HPC with 48 cores but my code is only using about 5% of the available CPU's. I have set the CPUCOUNT to 48 in the config file so I'm not sure why it isn't working. Has anyone else run into this problem?

 

9 REPLIES 9
SASKiwi
PROC Star

What OS is running on your HPC and what SAS products? My understanding is you need to use HPC-enabled SAS products like Visual Analytics for single SAS jobs to fully utilise HPC.

barb3126
Calcite | Level 5

The OS is Linux and this is a super computer at my academic institution - unfortunately, they provide little to no support to SAS users. I am running a large simulation that is taking hours to run. I have implemented parallel processing but am not seeing any improvement in run time and CPU usage isn't going up. I am an experienced SAS programmer but haven't used parallel processing before. The code runs pretty quickly on my laptop when I run a small number of simulations but the same number of simulations is much slower on the HPC. I am wondering if I have not set the options up correctly or am missing something. Thanks for any help you can give! 

Reeza
Super User
Do you have a SAS server license or just a desktop? Did you also change the memory options to increase size?
SASKiwi
PROC Star

What SAS product are you using for the simulation? How are you doing your parallel processing? If you are using Base SAS then I guess you are using custom code for the parallel processing or are you using some other method?

 

Does your simulation use IO much? If so then that could be what is slowing you down.

barb3126
Calcite | Level 5

I am using Base SAS (server license) and have increased the CPUCOUNT to 48 and the MEMSIZE to 2147483648 in the config file. I am using the code below to simulate C levels (~7000) within B levels (~500) within A levels (~5000). I have been testing the code using 960 A levels and it is taking ~ 5 hrs to run. Our R version of the model runs in about 20 minutes for all 5000 A levels. We want to run the model in SAS so we can track each A-B-C combination through the system (i.e. run an agent-based model), which our R model doesn't do. We also want to compare the SAS and R results. 

 

I start by randomly generating the A levels. I then split this dataset into sub-datasets that will be used in each thread. I originally didn't do this but tried it because I was concerned that each thread opening the original dataset was slowing things down. I then use parallel processing to simulate the B levels and the C levels using data steps, which do quite a number of calculations. From the log, it appears that the threads are running sequentially instead of asynchronously. I have attached the performance metrics during the run.

 

%do i=1 %to &numthreads;

    %let begin=%eval((&i-1)*(&num_sim_thread) + 1);

    %let end=%eval(&begin + (&num_sim_thread) - 1);

    %if &end>&num_sim %then %let end=%eval(&num_sim);

      data risk.level_a_sim_t&i;

            set risk.level_a_sim (where=(&begin <= level_a_num <= &end));

      run;

%end;

 

%macro parallel(thr);

 

signon task&thr inheritlib=(risk);

%syslput _all_ /remote=task&thr;

rsubmit wait=no;

 

%put THREAD START TIME: %sysfunc(datetime(),datetime18.1);

%put _all_;

 

data step to simulate level b within each level a…

 

data step to simulate level c within each level b…

 

proc datasets to create index for level a, b and c…

 

data step to sample level c and calculate metrics…

 

data step to merge metrics back onto overall dataset and finish simulation…

 

%put THREAD END TIME: %sysfunc(datetime(),datetime18.1);

%put PROCESSING TIME:  %sysfunc(putn(%sysevalf(%sysfunc(TIME())-&datetime_start.),mmss.)) (mm:ss) ;

 

endrsubmit;

 

%mend parallel_processing;

 

%do i=1 %to &numthreads;

  %parallel(&i);

%end;

 

waitfor _ALL_ %do i=1 %to &numthreads; task&i %end; ;

 

%do i=1 %to &numthreads;

      signoff task&i;

%end;

 

 

ChrisNZ
Tourmaline | Level 20

A few notes:

- The network seems saturated. Are the disks local?

- You access the same library with 48 processes. Unless the disks are really fast that is bound to cause contention.

- If the same table is accessed again and again for the different simulations and is the data set is small enough, you could try loading the table in memory with SASFILE after RSUBMIT. Maybe stagger the RSUBMITs a bit for this.

- What I would do: start with fewer processes. 2 then 4 , etc to see if there is a pattern. Hopefully 4 will be faster than 2 , but 12 might be the fastest, and beyond this you'll see that one resource is overloaded.

barb3126
Calcite | Level 5

This is a super computer and the server I am using has multiple nodes with 48 cores each. I'm fairly new to parallel processing and have read up quite a bit about it but am still struggling with how to code it right so I appreciate the help. So far, I have only been using one node and running 48 processes to take advantage of the 48 cores. I could request to use more nodes but am not sure how to call them from SAS. Any suggestions on how to do that? In the meantime, I will try scaling up as you suggest. Thanks!

SASKiwi
PROC Star

Your use of multiple SAS sessions for parallel processing is a good approach. I would focus on the steps in your simulation that are the slowest to understand where your bottlenecks are. 

ChrisNZ
Tourmaline | Level 20

As @SASKiwi said, the structure of your program is fine. 

 

You have to look at how it runs, and find the bottlenecks (SAS steps to be optimised and/or hardware contentions).

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1520 views
  • 0 likes
  • 4 in conversation