Solved: Why my CONNECT is not working well?

Eileen1496 · Posted 03-29-2024 08:03 PM

Hi all,

I'm new to use SAS/CONNECT to run codes simultaneously on multiple cores and CPUs. I found sometimes it works, it use 90% of the CPU and things are done in very short time. But sometimes it does not. I did not change the structure, so I was wondering maybe this is related to the type of task I'm doing?

The time when it works for me is: i have multiple datasets which contains lots of records of different user_id, I want to split the data by putting user_id starting with the same 2 digit together. For instance, put user_id start with 10 from all datasets together, do the same for 11-99.

The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.

I put my code for the second task (the one not working) below, can you help me see what might be the reason? (like it is because of the task itself, or i miss some code that prevent SAS/CONNECT from working properly?)

Thanks a lot!

libname raw "D:\data\raw";
libname cleaned "D:\data\cleaned";
libname temp "D:\temp"; 
libname user2022 "D:\data\cleaned\user_ever";
libname root "D:\data\cleaned\uid_wide";
libname fake "E:\work_fake";

data fake.rcid_2022;
set cleaned.comp_rcid_crswalk_2022;
run;

options sascmd="sas";
options CPUCOUNT= 45;
%let _start_dt = %sysfunc(datetime());

/* Prosess 10 */
signon task10 inheritlib=(raw cleaned root user2022 fake);
rsubmit task10 wait=no;
proc sql threads;
    create table user2022.user_2022_10_ind_0 as
    select *
    from root.root_10_ind_0 as a
    inner join fake.root_2022 as b
    on a.root= b.root;
quit;
endrsubmit;

/* 89 more such process */

waitfor _all_;
signoff _all_;
 
/* Print total duration */
data _null_;
   dur = datetime() - &_start_dt;
   put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
run;

AhmedAl_Attar · Posted 03-30-2024 08:16 AM

Hi @Eileen1496

There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly

You are trying to run all concurrent processes (Parent + forked SAS sessions) on your Single Desktop/Server
Your Single Desktop/Server has a fixed amount of CPUs , GB of RAM and Disk space which you can work within
Every forked SAS session would have its own default settings (-memsize, -cpucount, -sortsize) that gets initialized from sas_v9.cfg file or the command line. To find what's your SAS defaults are, run the following in your Parent SAS session
```
Proc Options group=performance; run;
```
check the log for settings values

You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option. Example:

options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";

Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds 80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!

Sadly, not every divide & conquer approach guarantees faster processing! Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.

You said

The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.

Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.

Just my 2 cents, hope this helps,

Ahmed

View solution in original post

AhmedAl_Attar · Posted 03-30-2024 08:16 AM

Hi @Eileen1496

There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly

You are trying to run all concurrent processes (Parent + forked SAS sessions) on your Single Desktop/Server
Your Single Desktop/Server has a fixed amount of CPUs , GB of RAM and Disk space which you can work within
Every forked SAS session would have its own default settings (-memsize, -cpucount, -sortsize) that gets initialized from sas_v9.cfg file or the command line. To find what's your SAS defaults are, run the following in your Parent SAS session
```
Proc Options group=performance; run;
```
check the log for settings values

You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option. Example:

options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";

Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds 80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!

Sadly, not every divide & conquer approach guarantees faster processing! Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.

You said

The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.

Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.

Just my 2 cents, hope this helps,

Ahmed

Eileen1496 · Posted 03-30-2024 08:27 AM

Hi Ahmed,

Thanks for the thoughts, I don't know the thing about memory to each task thing before. I guess I submit 99 task with 99 big datasets, and I do not have enough memory, so they start to wait in line. Thanks!

Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Re: Why my CONNECT is not working well?

Registration is open

SAS Training: Just a Click Away