BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
YHelie1
Fluorite | Level 6

Hi,

 

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:

 

OPTIONS FULLSTIMER;
proc cas;
	data casuser.junk;
		array a [100] a1-a100;
		do i=1 to 5000000;	/*5 million iterations*/
			j = 1 / i;
			k = i / j;
			do m = 1 to 10;
				a[m] = j * k * time();
			end;
			output;
		end;
	run;
run;
quit;

data junk;
	array a [100] a1-a100;
	do i=1 to 5000000;	/*5 million iterations*/
		j = 1 / i;
		k = i / j;
		do m = 1 to 10;
			a[m] = j * k * time();
		end;
		output;
	end;
run;

proc CAS process: Real Time: 8.15 seconds
Standard SAS    : Real Time: 28.48 seconds

 

???

 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

@YHelie1 wrote:

Hi,

 

I'm fully aware of the great benefits of in-memory processing provided with CAS, but what exactly makes the following data steps (first in CAS, then typical SAS 9) such a big difference:


The in-memory processing makes the difference. There is no disk write step in CAS as there is in SAS. Disk write takes longer than in-memory operations.

--
Paige Miller
Kurt_Bremser
Super User

Everything in CAS is multi-CPU capable and can spread the load over the grid. SAS 9.4 uses only one computer, and only some procedures (and not the DATA step, AFAIK) are multithreaded.

yabwon
Onyx | Level 15

A side note. 

You don't have to put CAS-enabled dataset in proc cas. The data step will run perfectly fine on it's own.

 

If you look into the log you will see something like:

93     proc cas;
NOTE: PROCEDURE CAS used (Total process time):
      real time           0.00 seconds
      user cpu time       0.00 seconds
      system cpu time     0.00 seconds
      memory              10648.53k
      OS Memory           45696.00k
          
94      data casuser.junk;
95       array a [100] a1-a100;
96       do i=1 to 5000000;  /*5 million iterations*/
97         j = 1 / i;
98         k = i / j;
99         do m = 1 to 10;
100          a[m] = j * k * time();
101        end;
102        output;
103      end;
104     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step has no input data set and will run in a single thread.
NOTE: The table junk in caslib CASUSER(********************) has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           10.73 seconds
      user cpu time       0.00 seconds
      system cpu time     0.01 seconds
      memory              1258.71k
      OS Memory           36888.00k

which means that proc cas stopped before data step was run. 

 

A side note to the side note:

When I executed "sas" data step it worked half the time:

82      data work.junk;
83       array a [100] a1-a100;
84       do i=1 to 5000000;  /*5 million iterations*/
85         j = 1 / i;
86         k = i / j;
87         do m = 1 to 10;
88           a[m] = j * k * time();
89         end;
90         output;
91       end;
92      run;
NOTE: The data set WORK.JUNK has 5000000 observations and 104 variables.
NOTE: DATA statement used (Total process time):
      real time           5.09 seconds
      user cpu time       3.21 seconds
      system cpu time     1.89 seconds
      memory              643.78k
      OS Memory           35604.00k

But it used more CPU time.

 

Bart

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

One more side note.  Both data steps, the CAS one and the SPRE one are running in a single thread here so you won't be able to see potential of "parallel" datastep.

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Quentin
Super User

@yabwon wrote:

One more thing. CAS has a lot of cool new features and advantages (e.g. "parallelism") but you have to be also aware that not all things form SAS (e.g. functions) will run in CAS, see: https://blogs.sas.com/content/iml/2020/02/19/sas-functions-not-run-in-cas.html

 

Bart


And even more interestingly, things like RETAIN will work in CAS, but they work within a thread, not across threads.  There's good documentation on this, e.g. https://documentation.sas.com/doc/en/pgmsascdc/v_039/casdspgm/p0ujjmynr82tfsn1pyp475bhvaib.htm#n1ais...

 

And I think ( @yabwon  - I didn't know you had CAS, can you confirm?) that in CAS if you do:

data want;
  set have end=eof;
  if eof;
run;

and if it runs multi-threaded,  want will have as many records as there are threads.  Because each thread will have one record where eof=1.  My mental map is that each thread gets it's own PDV.  I think, haven't tested, as I don't have access to CAS.

The Boston Area SAS Users Group (BASUG) is hosting our in person SAS Blowout on Oct 18!
This full-day event in Cambridge, Mass features four presenters from SAS, presenting on a range of SAS 9 programming topics. Pre-registration by Oct 15 is required.
Full details and registration info at https://www.basug.org/events.
yabwon
Onyx | Level 15

Yes @Quentin , it works as you wrote, and does have multiple observations.

Code:

data casuser.have ;
  set sashelp.cars;
  do i=1 to 100;
    output;
  end;
run;


data casuser.want1;
  set casuser.have end=eof;
  if eof;
  sum+invoice;
  t=_threadid_;
run;

Log:

82     data casuser.have ;
83       set sashelp.cars;
84       do i=1 to 100;
85         output;
86       end;
87     run;
NOTE: There were 428 observations read from the data set SASHELP.CARS.
NOTE: The data set CASUSER.HAVE has 42800 observations and 16 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      user cpu time       0.01 seconds
      system cpu time     0.02 seconds
      memory              1532.90k
      OS Memory           37404.00k

88     
89     
90     data casuser.want1;
91       set casuser.have end=eof;
92       if eof;
93       sum+invoice;
94       t=_threadid_;
95     run;
NOTE: Running DATA step in Cloud Analytic Services.
NOTE: The DATA step will run in multiple threads.
NOTE: There were 42800 observations read from the table HAVE in caslib CASUSER(************************).
NOTE: The table want1 in caslib CASUSER(************************) has 36 observations and 18 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              1328.34k
      OS Memory           37400.00k

 

Bart

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1674 views
  • 4 likes
  • 5 in conversation