BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
YHelie1
Fluorite | Level 6

Hi,

 

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:

 

OPTIONS FULLSTIMER;
proc cas;
	data casuser.junk;
		array a [100] a1-a100;
		do i=1 to 5000000;	/*5 million iterations*/
			j = 1 / i;
			k = i / j;
			do m = 1 to 10;
				a[m] = j * k * time();
			end;
			output;
		end;
	run;
run;
quit;

data junk;
	array a [100] a1-a100;
	do i=1 to 5000000;	/*5 million iterations*/
		j = 1 / i;
		k = i / j;
		do m = 1 to 10;
			a[m] = j * k * time();
		end;
		output;
	end;
run;

proc CAS process: Real Time: 8.15 seconds
Standard SAS    : Real Time: 28.48 seconds

 

???

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@YHelie1 wrote:

Hi,

 

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:


The in-memory processing makes the difference. There is no disk write step in CAS as there is in SAS. Disk write takes longer than in-memory operations.

--
Paige Miller
Kurt_Bremser
Super User

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

yabwon
Onyx | Level 15

A side note. 

You don't have to put CAS-enabled dataset in proc cas. The data step will run perfectly fine on it's own.

 

If you look into the log you will see something like:

93     proc cas;
NOTE: PROCEDURE CAS used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              10648.53k
      OS Memory           45696.00k
          
94      data casuser.junk;
95       array a [100] a1-a100;
96       do i=1 to 5000000;  /*5 million iterations*/
97         j = 1 / i;
98         k = i / j;
99         do m = 1 to 10;
100          a[m] = j * k * time();
101        end;
102        output;
103      end;
104     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step has no input data set and will run in a single thread.
NOTE: The table junk in caslib CASUSER(********************) has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           10.73 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              1258.71k
      OS Memory           36888.00k

which means that proc cas stopped before data step was run. 

 

A side note to the side note:

When I executed "sas" data step it worked half the time:

82      data work.junk;
83       array a [100] a1-a100;
84       do i=1 to 5000000;  /*5 million iterations*/
85         j = 1 / i;
86         k = i / j;
87         do m = 1 to 10;
88           a[m] = j * k * time();
89         end;
90         output;
91       end;
92      run;
NOTE: The data set WORK.JUNK has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           5.09 seconds
      user cpu time       3.21 seconds
      system cpu time     1.89 seconds
      memory              643.78k
      OS Memory           35604.00k

But it used more CPU time.

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

One more side note.  Both data steps, the CAS one and the SPRE one are running in a single thread here so you won't be able to see potential of "parallel" datastep.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

@yabwon wrote:

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

 

Bart


And even more interestingly, things like RETAIN will work in CAS, but they work within a thread, not across threads.  There's good documentation on this, e.g. https://documentation.sas.com/doc/en/pgmsascdc/v_039/casdspgm/p0ujjmynr82tfsn1pyp475bhvaib.htm#n1ais...

 

And I think ( @yabwon  - I didn't know you had CAS, can you confirm?) that in CAS if you do:

data want;
  set have end=eof;
  if eof;
run;

and if it runs multi-threaded,  want will have as many records as there are threads.  Because each thread will have one record where eof=1.  My mental map is that each thread gets it's own PDV.  I think, haven't tested, as I don't have access to CAS.

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
yabwon
Onyx | Level 15

Yes @Quentin , it works as you wrote, and does have multiple observations.

Code:

data casuser.have ;
  set sashelp.cars;
  do i=1 to 100;
    output;
  end;
run;


data casuser.want1;
  set casuser.have end=eof;
  if eof;
  sum+invoice;
  t=_threadid_;
run;

Log:

82     data casuser.have ;
83       set sashelp.cars;
84       do i=1 to 100;
85         output;
86       end;
87     run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set CASUSER.HAVE has 42800 observations and 16 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.02 seconds
      memory              1532.90k
      OS Memory           37404.00k

88     
89     
90     data casuser.want1;
91       set casuser.have end=eof;
92       if eof;
93       sum+invoice;
94       t=_threadid_;
95     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 42800 observations read from the table HAVE in caslib CASUSER(************************).
NOTE: The table want1 in caslib CASUSER(************************) has 36 observations and 18 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              1328.34k
      OS Memory           37400.00k

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1133 views
  • 4 likes
  • 5 in conversation