BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

 

Hi,

 

I am trying to execute the below SAS code in SAS Viya CAS environment and it took a long time as it is not using the CAS worker nodes. The log shows the session is using 0 worker node.

 

// code 

 


cas MySession sessopts=(caslib=casuser);
libname mycas cas caslib=casuser;

proc casutil;
load data=sashelp.cars replace;
run;

data mycas.bigcars;
set mycas.cars;
do i=1 to 150000;
output;
end;
run;

data mycas.bigcars_score;
set mycas.bigcars;
length myscore 8;
myscore=0.3*Invoice/(MSRP-Invoice)
+0.5*(EngineSize+Horsepower)/Weight + 0.2*(MPG_City+MPG_Highway);
Thread=_threadid_;
run;

 

//log info

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
56
57
58 cas MySession sessopts=(caslib=casuser);
NOTE: The session MYSESSION connected successfully to Cloud Analytic Services rpclab03045.exnet.sas.com using port 5570. The UUID
is cd58cd44-5125-0445-b613-019d12ffb677. The user is viyauser and the active caslib is CASUSER(viyauser).
NOTE: The SAS option SESSREF was updated with the value MYSESSION.
NOTE: The SAS macro _SESSREF_ was updated with the value MYSESSION.
NOTE: The session is using 0 workers.
NOTE: 'CASUSER(viyauser)' is now the active caslib.
NOTE: The CAS statement request to update one or more session options for session MYSESSION completed.
59 libname mycas cas caslib=casuser;
NOTE: Libref MYCAS was successfully assigned as follows:
Engine: CAS
Physical Name: cd58cd44-5125-0445-b613-019d12ffb677
60
61 proc casutil;
NOTE: The UUID 'cd58cd44-5125-0445-b613-019d12ffb677' is connected using session MYSESSION.
62
62 ! load data=sashelp.cars replace;
NOTE: SASHELP.CARS was successfully added to the "CASUSER(viyauser)" caslib as "CARS".
63 run;
 
64
 
NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
 
65 data mycas.bigcars;
 
66 set mycas.cars;
67 do i=1 to 150000;
68 output;
69 end;
70 run;
 
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 428 observations read from the table CARS in caslib CASUSER(viyauser).
NOTE: The table bigcars in caslib CASUSER(viyauser) has 64200000 observations and 16 variables.
NOTE: DATA statement used (Total process time):
real time 45.17 seconds
cpu time 0.01 seconds
 
 
71
72 data mycas.bigcars_score;
73 set mycas.bigcars;
74 length myscore 8;
75 myscore=0.3*Invoice/(MSRP-Invoice)
76 +0.5*(EngineSize+Horsepower)/Weight + 0.2*(MPG_City+MPG_Highway);
77 Thread=_threadid_;
78 run;
 
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 64200000 observations read from the table BIGCARS in caslib CASUSER(viyauser).
NOTE: The table bigcars_score in caslib CASUSER(viyauser) has 64200000 observations and 18 variables.
NOTE: DATA statement used (Total process time):
real time 23.69 seconds
cpu time 0.00 seconds
1 ACCEPTED SOLUTION

Accepted Solutions
BrettWujek
SAS Employee

Ok - Yes our Early Preview program is set up to provide you with an SMP (symmetric multiprocessing) server meaning everything runs on a single machine (still multi-threaded though).  For larger problems like working with a 64M observation data set you would definitely want to be using an MPP (massively parallel processing) server with worker nodes.  You don't have control over that in the EP program.  If you would like to explore this further I can see if someone can work more closely with you on this.

 

As an example, I just ran your same code on a MPP server with 4 worker nodes.  The times are much better.

 

197 libname mycas cas caslib=casuserhdfs;
NOTE: Libref MYCAS was successfully assigned as follows:
Engine: CAS
Physical Name: cfc4294c-f22f-094b-9ad5-36f9b0950c66
198 proc casutil;
NOTE: The UUID 'cfc4294c-f22f-094b-9ad5-36f9b0950c66' is connected using session MYSESS.
199 load data=sashelp.cars replace;
NOTE: SASHELP.CARS was successfully added to the "CASUSERHDFS(brwuje)" caslib as "CARS".
200 run;


NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds


201 data mycas.bigcars;
202 set mycas.cars;
203 do i=1 to 150000;
204 output;
205 end;
206 run;

NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 428 observations read from the table CARS in caslib CASUSERHDFS(brwuje).
NOTE: The table bigcars in caslib CASUSERHDFS(brwuje) has 64200000 observations and 16
variables.
NOTE: DATA statement used (Total process time):
real time 15.67 seconds
cpu time 0.12 seconds


207 data mycas.bigcars_score;
208 set mycas.bigcars;
209 length myscore 8;
210 myscore=0.3*Invoice/(MSRP-Invoice)
211 +0.5*(EngineSize+Horsepower)/Weight + 0.2*(MPG_City+MPG_Highway);
212 Thread=_threadid_;
213 run;

NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 64200000 observations read from the table BIGCARS in caslib
CASUSERHDFS(brwuje).
NOTE: The table bigcars_score in caslib CASUSERHDFS(brwuje) has 64200000 observations and 18
variables.
NOTE: DATA statement used (Total process time):
real time 10.26 seconds
cpu time 0.03 seconds


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

View solution in original post

4 REPLIES 4
BrettWujek
SAS Employee

With 64M observations I am not surprised this is taking a long time with this setup - it appears that your CAS server was started in SMP mode...meaning that it all runs on the same machine with no worker nodes.  How was your CAS server started?

 

Just as some back-info here for those that might not be aware...the CAS server establishes the distributed in-memory execution environment that is available to you - you start a CAS session as your own isolated process on that server to govern execution of your own jobs.  The session environment can only be a subset of how the server environment is established.

 

So please provide info on how your CAS server is started.

 

Thanks.


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sivaram_veerabagu
Calcite | Level 5

I am using SAS Viya Early Preview program environment and I am not aware of how the CAS server is started.

 

Thanks.

BrettWujek
SAS Employee

Ok - Yes our Early Preview program is set up to provide you with an SMP (symmetric multiprocessing) server meaning everything runs on a single machine (still multi-threaded though).  For larger problems like working with a 64M observation data set you would definitely want to be using an MPP (massively parallel processing) server with worker nodes.  You don't have control over that in the EP program.  If you would like to explore this further I can see if someone can work more closely with you on this.

 

As an example, I just ran your same code on a MPP server with 4 worker nodes.  The times are much better.

 

197 libname mycas cas caslib=casuserhdfs;
NOTE: Libref MYCAS was successfully assigned as follows:
Engine: CAS
Physical Name: cfc4294c-f22f-094b-9ad5-36f9b0950c66
198 proc casutil;
NOTE: The UUID 'cfc4294c-f22f-094b-9ad5-36f9b0950c66' is connected using session MYSESS.
199 load data=sashelp.cars replace;
NOTE: SASHELP.CARS was successfully added to the "CASUSERHDFS(brwuje)" caslib as "CARS".
200 run;


NOTE: PROCEDURE CASUTIL used (Total process time):
real time 0.15 seconds
cpu time 0.01 seconds


201 data mycas.bigcars;
202 set mycas.cars;
203 do i=1 to 150000;
204 output;
205 end;
206 run;

NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 428 observations read from the table CARS in caslib CASUSERHDFS(brwuje).
NOTE: The table bigcars in caslib CASUSERHDFS(brwuje) has 64200000 observations and 16
variables.
NOTE: DATA statement used (Total process time):
real time 15.67 seconds
cpu time 0.12 seconds


207 data mycas.bigcars_score;
208 set mycas.bigcars;
209 length myscore 8;
210 myscore=0.3*Invoice/(MSRP-Invoice)
211 +0.5*(EngineSize+Horsepower)/Weight + 0.2*(MPG_City+MPG_Highway);
212 Thread=_threadid_;
213 run;

NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 64200000 observations read from the table BIGCARS in caslib
CASUSERHDFS(brwuje).
NOTE: The table bigcars_score in caslib CASUSERHDFS(brwuje) has 64200000 observations and 18
variables.
NOTE: DATA statement used (Total process time):
real time 10.26 seconds
cpu time 0.03 seconds


Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF

View now: on-demand content for SAS users

sivaram_veerabagu
Calcite | Level 5

Thanks for your explanation and the logs from MPP server.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1888 views
  • 2 likes
  • 2 in conversation