Hi all,
I'm new to use SAS/CONNECT to run codes simultaneously on multiple cores and CPUs. I found sometimes it works, it use 90% of the CPU and things are done in very short time. But sometimes it does not. I did not change the structure, so I was wondering maybe this is related to the type of task I'm doing?
The time when it works for me is: i have multiple datasets which contains lots of records of different user_id, I want to split the data by putting user_id starting with the same 2 digit together. For instance, put user_id start with 10 from all datasets together, do the same for 11-99.
The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.
I put my code for the second task (the one not working) below, can you help me see what might be the reason? (like it is because of the task itself, or i miss some code that prevent SAS/CONNECT from working properly?)
Thanks a lot!
libname raw "D:\data\raw";
libname cleaned "D:\data\cleaned";
libname temp "D:\temp"; 
libname user2022 "D:\data\cleaned\user_ever";
libname root "D:\data\cleaned\uid_wide";
libname fake "E:\work_fake";
data fake.rcid_2022;
set cleaned.comp_rcid_crswalk_2022;
run;
options sascmd="sas";
options CPUCOUNT= 45;
%let _start_dt = %sysfunc(datetime());
/* Prosess 10 */
signon task10 inheritlib=(raw cleaned root user2022 fake);
rsubmit task10 wait=no;
proc sql threads;
    create table user2022.user_2022_10_ind_0 as
    select *
    from root.root_10_ind_0 as a
    inner join fake.root_2022 as b
    on a.root= b.root;
quit;
endrsubmit;
/* 89 more such process */
waitfor _all_;
signoff _all_;
 
/* Print total duration */
data _null_;
   dur = datetime() - &_start_dt;
   put 30*'-' / ' TOTAL DURATION:' dur time13.2 / 30*'-';
run;
					
				
			
			
				
			
			
			
			
			
			
			
		Hi @Eileen1496
There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly
Proc Options group=performance; run;You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option. Example:
options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds 80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!
Sadly, not every divide & conquer approach guarantees faster processing! Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.
You said
The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.
Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.
Just my 2 cents, hope this helps,
Ahmed
Hi @Eileen1496
There are few fundamental elements you have to pay attention to, and rethink your design/approach accordingly
Proc Options group=performance; run;You can manually specify/dictate how much each forked SAS session utilizes by changing your sascmd option. Example:
options sascmd="sas -memsize 4G -sortsize 3G -cpucount 8";Having said that, you'll need to ensure the total amount of memory (memsize) and cpucount of all your concurrent SAS sessions (Parent + forked/child) never exceeds 80% your machine's resources (#2 from above), otherwise everything will slow down and processing gets queued while your Desktop/Server trying to make resources available!!
Sadly, not every divide & conquer approach guarantees faster processing! Sometimes adjusting the assigned session settings (-memsize, -cpucount, -sortsize, -work, -utilloc ) along with performant advanced coding techniques can process your data in a single SAS session much faster.
You said
The time when it does not work for me is: I try to do an inner join between 2 datasets. I think since both of them require computing (instead of just read into memory, which i believe does not use CPU), they should all be speed up by using SAS/CONNECT.
Loading data into memory still requires compute to perform the search/lookup operation for the join to take place, but it does it much faster than reading from disk. So if you can allocate enough memory (-memsize) to your SAS session to load at least one of your two datasets into memory (Hash Object), this would allow to process your data much faster and more straight forward. It's just another way to look at the issue.
Just my 2 cents, hope this helps,
Ahmed
Hi Ahmed,
Thanks for the thoughts, I don't know the thing about memory to each task thing before. I guess I submit 99 task with 99 big datasets, and I do not have enough memory, so they start to wait in line. Thanks!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
