Solved: Re: CAS vs SAS

YvanHelie · Posted 05-18-2023 10:41 AM

Hi,

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:

OPTIONS FULLSTIMER;
proc cas;
	data casuser.junk;
		array a [100] a1-a100;
		do i=1 to 5000000;	/*5 million iterations*/
			j = 1 / i;
			k = i / j;
			do m = 1 to 10;
				a[m] = j * k * time();
			end;
			output;
		end;
	run;
run;
quit;

data junk;
	array a [100] a1-a100;
	do i=1 to 5000000;	/*5 million iterations*/
		j = 1 / i;
		k = i / j;
		do m = 1 to 10;
			a[m] = j * k * time();
		end;
		output;
	end;
run;

proc CAS process: Real Time: 8.15 seconds
Standard SAS : Real Time: 28.48 seconds

???

Kurt_Bremser · Posted 05-18-2023 11:08 AM

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

PaigeMiller · Posted 05-18-2023 10:58 AM

@YvanHelie wrote:

Hi,

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:

The in-memory processing makes the difference. There is no disk write step in CAS as there is in SAS. Disk write takes longer than in-memory operations.

--
Paige Miller

Kurt_Bremser · Posted 05-18-2023 11:08 AM

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

yabwon · Posted 05-18-2023 12:16 PM

A side note.

You don't have to put CAS-enabled dataset in proc cas. The data step will run perfectly fine on it's own.

If you look into the log you will see something like:

93     proc cas;
NOTE: PROCEDURE CAS used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              10648.53k
      OS Memory           45696.00k
          
94      data casuser.junk;
95       array a [100] a1-a100;
96       do i=1 to 5000000;  /*5 million iterations*/
97         j = 1 / i;
98         k = i / j;
99         do m = 1 to 10;
100          a[m] = j * k * time();
101        end;
102        output;
103      end;
104     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step has no input data set and will run in a single thread.
NOTE: The table junk in caslib CASUSER(********************) has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           10.73 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              1258.71k
      OS Memory           36888.00k

which means that proc cas stopped before data step was run.

A side note to the side note:

When I executed "sas" data step it worked half the time:

82      data work.junk;
83       array a [100] a1-a100;
84       do i=1 to 5000000;  /*5 million iterations*/
85         j = 1 / i;
86         k = i / j;
87         do m = 1 to 10;
88           a[m] = j * k * time();
89         end;
90         output;
91       end;
92      run;
NOTE: The data set WORK.JUNK has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           5.09 seconds
      user cpu time       3.21 seconds
      system cpu time     1.89 seconds
      memory              643.78k
      OS Memory           35604.00k

But it used more CPU time.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 05-18-2023 12:20 PM

One more side note. Both data steps, the CAS one and the SPRE one are running in a single thread here so you won't be able to see potential of "parallel" datastep.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

yabwon · Posted 05-18-2023 12:26 PM

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Quentin · Posted 05-18-2023 01:39 PM

@yabwon wrote:

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

Bart

And even more interestingly, things like RETAIN will work in CAS, but they work within a thread, not across threads. There's good documentation on this, e.g. https://documentation.sas.com/doc/en/pgmsascdc/v_039/casdspgm/p0ujjmynr82tfsn1pyp475bhvaib.htm#n1ais...

And I think ( @yabwon - I didn't know you had CAS, can you confirm?) that in CAS if you do:

data want;
  set have end=eof;
  if eof;
run;

and if it runs multi-threaded, want will have as many records as there are threads. Because each thread will have one record where eof=1. My mental map is that each thread gets it's own PDV. I think, haven't tested, as I don't have access to CAS.

yabwon · Posted 05-18-2023 03:59 PM

Yes @Quentin , it works as you wrote, and does have multiple observations.

Code:

data casuser.have ;
  set sashelp.cars;
  do i=1 to 100;
    output;
  end;
run;


data casuser.want1;
  set casuser.have end=eof;
  if eof;
  sum+invoice;
  t=_threadid_;
run;

Log:

82     data casuser.have ;
83       set sashelp.cars;
84       do i=1 to 100;
85         output;
86       end;
87     run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set CASUSER.HAVE has 42800 observations and 16 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.02 seconds
      memory              1532.90k
      OS Memory           37404.00k

88     
89     
90     data casuser.want1;
91       set casuser.have end=eof;
92       if eof;
93       sum+invoice;
94       t=_threadid_;
95     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 42800 observations read from the table HAVE in caslib CASUSER(************************).
NOTE: The table want1 in caslib CASUSER(************************) has 36 observations and 18 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              1328.34k
      OS Memory           37400.00k

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Registration is open

SAS Training: Just a Click Away