BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Eileen1496
Obsidian | Level 7

Hi all,

 

I'm new to use SAS/CONNECT to run codes simultaneously on multiple cores and CPUs. I found sometimes it works, it use 90% of the CPU and things are done in very short time. But sometimes it does not. I did not change the structure, so I was wondering maybe this is related to the type of task I'm doing? 

The time when it works for me is: i have multiple datasets which contains lots of records of different user_id, I want to split the data by putting user_id starting with the same 2 digit together. For instance, put user_id start with 10 from all datasets together, do the same for 11-99. 

The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT. 

I put my code for the second task (the one not working) below, can you help me see what might be the reason? (like it is because of the task itself, or i miss some code that prevent SAS/CONNECT from working properly?)

Thanks a lot!

libname raw "D:\data\raw";
libname cleaned "D:\data\cleaned";
libname temp "D:\temp"; 
libname user2022 "D:\data\cleaned\user_ever";
libname root "D:\data\cleaned\uid_wide";
libname fake "E:\work_fake";

data fake.rcid_2022;
set cleaned.comp_rcid_crswalk_2022;
run;

options sascmd="sas";
options CPUCOUNT= 45;
%let _start_dt = %sysfunc(datetime());

/* Prosess 10 */
signon task10 inheritlib=(raw cleaned root user2022 fake);
rsubmit task10 wait=no;
proc sql threads;
    create table user2022.user_2022_10_ind_0 as
    select *
    from root.root_10_ind_0 as a
    inner join fake.root_2022 as b
    on a.root= b.root;
quit;
endrsubmit;

/* 89 more such process */

waitfor _all_;
signoff _all_;
 
/* Print total duration */
data _null_;
   dur = datetime() - &_start_dt;
   put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
run;

1 ACCEPTED SOLUTION

Accepted Solutions
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Eileen1496 

There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly

  1. You are trying to run all concurrent processes (Parent + forked SAS sessions) on your Single Desktop/Server
  2. Your Single Desktop/Server has a fixed amount of CPUs , GB of RAM and Disk space which you can work within 
  3. Every forked SAS session would have its own default settings (-memsize, -cpucount, -sortsize) that gets initialized from sas_v9.cfg file or the command line. To find what's your SAS defaults are, run the following in your Parent SAS session 
    Proc Options group=performance; run;
    check the log for settings values

You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option.  Example:

options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";

Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds  80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!

 

Sadly, not every divide & conquer approach guarantees faster processing!  Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.

 

You said


The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT. 

Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.

 

Just my 2 cents, hope this helps,

Ahmed

View solution in original post

2 REPLIES 2
AhmedAl_Attar
Rhodochrosite | Level 12

Hi @Eileen1496 

There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly

  1. You are trying to run all concurrent processes (Parent + forked SAS sessions) on your Single Desktop/Server
  2. Your Single Desktop/Server has a fixed amount of CPUs , GB of RAM and Disk space which you can work within 
  3. Every forked SAS session would have its own default settings (-memsize, -cpucount, -sortsize) that gets initialized from sas_v9.cfg file or the command line. To find what's your SAS defaults are, run the following in your Parent SAS session 
    Proc Options group=performance; run;
    check the log for settings values

You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option.  Example:

options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";

Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds  80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!

 

Sadly, not every divide & conquer approach guarantees faster processing!  Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.

 

You said


The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT. 

Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.

 

Just my 2 cents, hope this helps,

Ahmed

Eileen1496
Obsidian | Level 7

Hi Ahmed,

 

Thanks for the thoughts, I don't know the thing about memory to each task thing before. I guess I submit 99 task with 99 big datasets, and I do not have enough memory, so they start to wait in line. Thanks!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 349 views
  • 1 like
  • 2 in conversation