<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: DATA Step BY-Group Processing: Compute Server vs CAS in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821173#M324199</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Questions&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the BY-Group processing is so much faster on the Compute Server?&lt;BR /&gt;My guess would be that distributing rows across threads according to BY variables is very slow on the CAS Server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the CPU time is so low on the CAS server compared to the real time and that increasing the amount of data almost only affects real time?&lt;BR /&gt;I even expected the CPU time to be larger than real time because of parallelization on the CAS server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Luhan&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is your test a little too artificial to be informative? Your test dataset is very narrow, making the preparatory PROC SORT very cheap.&amp;nbsp; &amp;nbsp;I wonder whether the apparent real-time superiority of the SORT followed by the non-CAS data step would still show up with a fat file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've never used CAS, so this is said in complete ignorance.&amp;nbsp; For speed test purposes, why not run the CAS data once with, and once without BY group, as in&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  set casuser.test;
  count+1;
run;

data _null_;
  set casuser.test;
  by name;
  count+1;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Yes, I know that it doesn't produce the results you want, but it does tell you the impact of a BY statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Question: does a BY statement cause CAS to pre-sort the unordered data, or does CAS just create threads based on by-values?&amp;nbsp; If it's the latter, then CAS will never satisfy any of the "&lt;EM&gt;&lt;STRONG&gt;if last.name;&lt;/STRONG&gt;&lt;/EM&gt;" filters until the end of the data set.&amp;nbsp; That's probably a lot of overhead, unneeded by the single thread approach applied against a sorted data set.&amp;nbsp; Yes, there are only three values for name in your sample, but maybe the fixed cost of maintaining a dynamic set of by groups is big, no matter the cardinality.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 30 Jun 2022 16:20:49 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2022-06-30T16:20:49Z</dc:date>
    <item>
      <title>DATA Step BY-Group Processing: Compute Server vs CAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821107#M324160</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I experience very long run times on the CAS Server when using BY-Group processing in a DATA Step (even with uniformly distributed values across few or many threads). Using FED SQL instead, or even running PROC SORT + DATA Step on the Compute Server seems to be always faster. The issue can be replicated in the Virtual Lab of a SAS course as follows:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 1:&lt;/STRONG&gt; Create the same data set containing names with 10 million rows in both environments (work.test + casuser.test).&lt;/P&gt;&lt;PRE&gt;data
	work.test
	casuser.test
;
	call streaminit(100);
	do i = 1 to 10000000;
		random = rand("Integer", 1, 5);
		if 1 &amp;lt;= random &amp;lt;= 3 then name = "R. Pearlman";
		else if random  = 4 then name = "J. McNulty";
		else if random  = 5 then name = "C. Daniels";
		output; 
	end;
	keep name;
run;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Step 2:&lt;/STRONG&gt; Count the number of occurrences for each name on the Compute Server (this requires a SORT Step):&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;proc sort
	data = test
	out  = test2
;
	by name;
run;

data test2;
	set test2;
	by name;
	if first.name then count = 0;
	count +1;
	if last.name;
run;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;In total, this code needed 1.53 seconds to run. CPU time is 4.17 which tells me that some parallelization took place here:&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Compute.png" style="width: 531px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72849iDFBC95B1976A94AC/image-size/large?v=v2&amp;amp;px=999" role="button" title="Compute.png" alt="Compute.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Step 3:&lt;/STRONG&gt; Count the number of occurrences for each name on the CAS Server (no sorting required):&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;data casuser.test2;
	set casuser.test;
	by name;
	if first.name then count = 0;
	count +1;
	if last.name;
run;&lt;/PRE&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;This code took 11.28 seconds in total but used only 0.02 seconds of CPU time:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="CAS Server.png" style="width: 663px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72850i223F5B73BE1F5D2D/image-size/large?v=v2&amp;amp;px=999" role="button" title="CAS Server.png" alt="CAS Server.png" /&gt;&lt;/span&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Questions&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the BY-Group processing is so much faster on the Compute Server?&lt;BR /&gt;My guess would be that distributing rows across threads according to BY variables is very slow on the CAS Server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the CPU time is so low on the CAS server compared to the real time and that increasing the amount of data almost only affects real time?&lt;BR /&gt;I even expected the CPU time to be larger than real time because of parallelization on the CAS server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Luhan&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2022 12:14:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821107#M324160</guid>
      <dc:creator>Luhan</dc:creator>
      <dc:date>2022-06-30T12:14:30Z</dc:date>
    </item>
    <item>
      <title>Re: DATA Step BY-Group Processing: Compute Server vs CAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821173#M324199</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Questions&lt;/STRONG&gt;:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the BY-Group processing is so much faster on the Compute Server?&lt;BR /&gt;My guess would be that distributing rows across threads according to BY variables is very slow on the CAS Server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;Why is it that the CPU time is so low on the CAS server compared to the real time and that increasing the amount of data almost only affects real time?&lt;BR /&gt;I even expected the CPU time to be larger than real time because of parallelization on the CAS server.&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks,&lt;/P&gt;
&lt;P&gt;Luhan&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is your test a little too artificial to be informative? Your test dataset is very narrow, making the preparatory PROC SORT very cheap.&amp;nbsp; &amp;nbsp;I wonder whether the apparent real-time superiority of the SORT followed by the non-CAS data step would still show up with a fat file.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I've never used CAS, so this is said in complete ignorance.&amp;nbsp; For speed test purposes, why not run the CAS data once with, and once without BY group, as in&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  set casuser.test;
  count+1;
run;

data _null_;
  set casuser.test;
  by name;
  count+1;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Yes, I know that it doesn't produce the results you want, but it does tell you the impact of a BY statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Question: does a BY statement cause CAS to pre-sort the unordered data, or does CAS just create threads based on by-values?&amp;nbsp; If it's the latter, then CAS will never satisfy any of the "&lt;EM&gt;&lt;STRONG&gt;if last.name;&lt;/STRONG&gt;&lt;/EM&gt;" filters until the end of the data set.&amp;nbsp; That's probably a lot of overhead, unneeded by the single thread approach applied against a sorted data set.&amp;nbsp; Yes, there are only three values for name in your sample, but maybe the fixed cost of maintaining a dynamic set of by groups is big, no matter the cardinality.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2022 16:20:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821173#M324199</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2022-06-30T16:20:49Z</dc:date>
    </item>
    <item>
      <title>Re: DATA Step BY-Group Processing: Compute Server vs CAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821198#M324206</link>
      <description>&lt;P&gt;Thanks for your reply and suggestion to isolate the BY statement effect. It seems to support my guess that DATA step processing on CAS slows down significantly when the data is distributed according to BY variables.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 1&lt;/STRONG&gt;: BY or not to BY with &lt;STRONG&gt;original sample data&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;Without a BY statement, CAS evenly distributes the data across the 32 available threads of my session in no time:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data _null_;
	set casuser.test end=eof;

	if eof;
	put "thread:" _threadid_ " obs:" _n_;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="02 CAS Server 1 (2).png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72861i9509593CC105877E/image-size/large?v=v2&amp;amp;px=999" role="button" title="02 CAS Server 1 (2).png" alt="02 CAS Server 1 (2).png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;When using a BY statement, the distribution is done across 3 threads (because the BY variable has 3 levels) and this slows things down significantly:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;data _null_;
	set casuser.test end=eof;

	by name;

	if eof;
	put "thread:" _threadid_ " obs:" _n_;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="02 CAS Server 2.png" style="width: 706px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72860i7E177BCFCE6E2155/image-size/large?v=v2&amp;amp;px=999" role="button" title="02 CAS Server 2.png" alt="02 CAS Server 2.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 2&lt;/STRONG&gt;: BY or not to BY with &lt;STRONG&gt;uniformly distributed sample data&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I thought using a BY variable that is uniformly distributed with 32 levels would take full advantage of the 32 threads:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data casuser.test;
	call streaminit(100);
	do i=1 to 10000000;
		random = rand("Integer", 1, 32);
		name = catx(" ", "Name number", random);
		output;
	end;
run;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Surprisingly, the performance got even worse - even though (or because) 23 instead of 3 threads were used:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="sas"&gt;data _null_;
	set casuser.test end=eof;

	by name;

	if eof;
	put "thread:" _threadid_ " obs:" _n_;
run;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="03 CAS Server 2 (2).png" style="width: 999px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72862i88A93674A89A2BFD/image-size/large?v=v2&amp;amp;px=999" role="button" title="03 CAS Server 2 (2).png" alt="03 CAS Server 2 (2).png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Step 3&lt;/STRONG&gt;: FED SQL&lt;/P&gt;&lt;P&gt;Just for comparison reasons: &lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;This is the run time of FED SQL step on the same data, grouping by name, additionally applying a summary function, and generating a report:&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;proc fedsql sessref=casauto;
	select name, count(*)
	from casuser.test
	group by name;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Luhan_0-1656615416629.png" style="width: 400px;"&gt;&lt;img src="https://communities.sas.com/t5/image/serverpage/image-id/72863i207D4FDF0EFDF5B9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Luhan_0-1656615416629.png" alt="Luhan_0-1656615416629.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Conclusion&lt;/STRONG&gt; (so far):&lt;/P&gt;&lt;P&gt;At least on my setups (Viya SMP), the performance of DATA steps decreases significantly when BY variables influence the distribution of the data. Using FED SQL or CAS actions instead seems to be the way to go in case of BY-Group processing on CAS.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2022 19:33:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821198#M324206</guid>
      <dc:creator>Luhan</dc:creator>
      <dc:date>2022-06-30T19:33:35Z</dc:date>
    </item>
    <item>
      <title>Re: DATA Step BY-Group Processing: Compute Server vs CAS</title>
      <link>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821202#M324209</link>
      <description>&lt;P&gt;What version of Viya are you using? Have you tracked this issue with SAS Tech Support and if so what was the response? If not then I suggest you do so and then add their response to this post.&lt;/P&gt;</description>
      <pubDate>Thu, 30 Jun 2022 20:02:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/DATA-Step-BY-Group-Processing-Compute-Server-vs-CAS/m-p/821202#M324209</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2022-06-30T20:02:45Z</dc:date>
    </item>
  </channel>
</rss>

