<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Select one customer in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955728#M373233</link>
    <description>&lt;P&gt;It's called &lt;STRONG&gt;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Maxims-of-Maximally-Efficient-SAS-Programmers/ta-p/352068/show-comments/false" target="_self"&gt;Maxim 4&lt;/A&gt;&lt;/STRONG&gt;, you should start to use it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[EDIT:]&lt;/P&gt;
&lt;P&gt;I agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;adding an index could be good idea.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;B.&lt;/P&gt;</description>
    <pubDate>Fri, 10 Jan 2025 11:22:10 GMT</pubDate>
    <dc:creator>yabwon</dc:creator>
    <dc:date>2025-01-10T11:22:10Z</dc:date>
    <item>
      <title>Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955710#M373230</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;Lets say I have a data set with many rows and many columns and I want to select the rows of one specific customer.&lt;/P&gt;
&lt;P&gt;What is the most quick way to do it between these 4 ways or maybe there is a better way?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
Data Way1;
set r_r.t100millionrows(Where=(CustID=123456));
Run;

proc sql;
create table Way2 as
select *
from r_r.t100millionrows
where  lakoach=7550
;
quit;

proc sql;
create table Way3 as
select *
from r_r.t100millionrows(Where=(lakoach=7550))
;
quit;

proc sql;
create table Way4(Where=(lakoach=7550)) as
select *
from r_r.t100millionrows
;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 10:20:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955710#M373230</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-01-10T10:20:12Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955726#M373231</link>
      <description>&lt;P&gt;They are pretty much the same, where clauses hit the I/O buffer, regardless of syntax used.&lt;/P&gt;
&lt;P&gt;Way 4 is probably slower since it acts on output data, rather than filtering the source table.&lt;/P&gt;
&lt;P&gt;What would speed up retrieval is using an index on lakoach/custid.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 11:13:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955726#M373231</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2025-01-10T11:13:12Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955728#M373233</link>
      <description>&lt;P&gt;It's called &lt;STRONG&gt;&lt;A href="https://communities.sas.com/t5/SAS-Communities-Library/Maxims-of-Maximally-Efficient-SAS-Programmers/ta-p/352068/show-comments/false" target="_self"&gt;Maxim 4&lt;/A&gt;&lt;/STRONG&gt;, you should start to use it.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[EDIT:]&lt;/P&gt;
&lt;P&gt;I agree with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13674"&gt;@LinusH&lt;/a&gt;&amp;nbsp;adding an index could be good idea.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;B.&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jan 2025 11:22:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955728#M373233</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2025-01-10T11:22:10Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955743#M373236</link>
      <description>&lt;P&gt;Hey &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;! I agree with &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;'s approach: test it out! One way you can test it is this timeit skeleton macro I threw together.&lt;/P&gt;
&lt;P&gt;&lt;A href="https://github.com/stu-code/sas/blob/master/utility-macros/timeit.sas" target="_blank" rel="noopener"&gt;https://github.com/stu-code/sas/blob/master/utility-macros/timeit.sas&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For convenience, here is your code put into this macro:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%macro timeit(trials=100);
    %local i n t start;
    %do i = 1 %to 10;
        %local time&amp;amp;i;
    %end;

    %do t = 1 %to &amp;amp;trials;
        
        %let n = 0; /* Do not change */

     /* Define your code chunks below 

        **** Important ****

        You must increment the variable n with each code chunk. To create 
        a code chunk to test, use this skeleton code:

        %let n = %eval(&amp;amp;n+1);
        %let start=%sysfunc(datetime());
            &amp;lt;code&amp;gt;
        %let time&amp;amp;n = %sysevalf(%sysfunc(datetime())-&amp;amp;start); 
     */

        /* Code 1 */
        %let n = %eval(&amp;amp;n+1);
        %let start=%sysfunc(datetime());
            Data Way1;
                set r_r.t100millionrows(Where=(CustID=123456));
            Run;
        %let time&amp;amp;n = %sysevalf(%sysfunc(datetime())-&amp;amp;start);

        /* Code 2 */
        %let n = %eval(&amp;amp;n+1);
        %let start=%sysfunc(datetime());
            proc sql;
                create table Way2 as
                select *
                from r_r.t100millionrows
                where  lakoach=7550
                ;
            quit;
        %let time&amp;amp;n = %sysevalf(%sysfunc(datetime())-&amp;amp;start);

        /* Code 3 */
        %let n = %eval(&amp;amp;n+1);
        %let start=%sysfunc(datetime());
            proc sql;
                create table Way3 as
                select *
                from r_r.t100millionrows(Where=(lakoach=7550))
            ;
            quit;
        %let time&amp;amp;n = %sysevalf(%sysfunc(datetime())-&amp;amp;start);

        /* Code 4 */
        %let n = %eval(&amp;amp;n+1);
        %let start=%sysfunc(datetime());
            proc sql;
                create table Way4(Where=(lakoach=7550)) as
                select *
                from r_r.t100millionrows
                ;
            quit;
        %let time&amp;amp;n = %sysevalf(%sysfunc(datetime())-&amp;amp;start);

        data time;
            %do i = 1 %to &amp;amp;n;
                time&amp;amp;i = &amp;amp;&amp;amp;time&amp;amp;i;
            %end;
        run;

        proc append base=times data=time;
        run;
    %end;

    proc sql;
        select mean(time1) as avg_time1 label="Avg: Method 1"
             , std(time1)  as std_time1 label="Std: Method 1"
             %do i = 1 %to &amp;amp;n;
             , mean(time&amp;amp;i) as avg_time&amp;amp;i label="Avg: Method &amp;amp;i"
             , std(time&amp;amp;i)  as std_time&amp;amp;i label="Std: Method &amp;amp;i"
             %end;

        from times;
    quit;

    proc datasets lib=work nolist;
        delete times;
    quit;

%mend;

%timeit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Fri, 10 Jan 2025 15:25:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955743#M373236</guid>
      <dc:creator>Stu_SAS</dc:creator>
      <dc:date>2025-01-10T15:25:05Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955785#M373245</link>
      <description>&lt;P&gt;Is the data sorted by CUSTID?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If so, and if there is no index of CUSTID, you can usually get something much faster than the WHERE filter by programming a binary search.&amp;nbsp; That is not to say I don't recommend generating an index, if the dataset is to be repeatedly accessed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there is only one record per CUSTID, then this simple binary search:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;%let srch=123456;
data want (drop=_:);
  _lo_pt=0;
  _hi_pt=nrecs+1;

  do until (custid=&amp;amp;srch);
    if _hi_pt = _lo_pt +1 then stop;  /*Search value not found*/

    p=floor(mean(_lo_pt,_hi_pt));
    set have nobs=nrecs point=p;
    if      custid&amp;lt;&amp;amp;srch then _lo_pt=p;
    else if custid&amp;gt;&amp;amp;srch then _hi_pt=p;
  end;

  output;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there can be multiple instances of a given CUSTID value, then after finding a qualifying obs, search backward to find the first instance of that value.&amp;nbsp; Then output consecutive obs until the value changes or the dataset ends:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (drop=_:);
  _lo_pt=0;
  _hi_pt=nrecs+1;
  if 0 then set have nobs=nrecs;

  do until (custid=&amp;amp;srch);
    if _hi_pt = _lo_pt +1 then stop;  /*Search value not found*/

    p=floor(mean(_lo_pt,_hi_pt));
    set have point=p;
    if      custid&amp;lt;&amp;amp;srch then _lo_pt=p;
    else if custid&amp;gt;&amp;amp;srch then _hi_pt=p;
  end;
  
  if p&amp;gt;1 then do q=p-1 to _lo_pt+1 by -1 until (custid&amp;lt;&amp;amp;srch);
    set have point=q;
  end;
  do p=q+1 to _hi_pt-1;
    set have point=p;
    if custid=&amp;amp;srch then output;
    else leave;
  end;
  stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jan 2025 02:18:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955785#M373245</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-01-11T02:18:47Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955791#M373246</link>
      <description>Try this one :&lt;BR /&gt;&lt;BR /&gt;Data Way1;&lt;BR /&gt;set r_r.t100millionrows;&lt;BR /&gt;if CustID=123456 ;&lt;BR /&gt;Run;</description>
      <pubDate>Sat, 11 Jan 2025 07:10:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955791#M373246</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-11T07:10:20Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955799#M373253</link>
      <description>&lt;P&gt;To extract small subsets from large data, an index is really helpful. It uses optimized search methods (b-tree) internally.&lt;/P&gt;
&lt;P&gt;Just don't use it to process the &lt;U&gt;whole&lt;/U&gt;&amp;nbsp;(or large subsets of the) dataset in a particular order. For that, sorting is better.&lt;/P&gt;</description>
      <pubDate>Sat, 11 Jan 2025 11:27:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955799#M373253</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2025-01-11T11:27:12Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955874#M373293</link>
      <description>&lt;P&gt;If you have PROC DS2, this way is the most fast/efficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 do i=1 to 1e6;
  age=rand('integer',1,100000);output;
 end;
run;

proc ds2;
data want(overwrite=yes);
method run();
 set have  ;
 if age=16;
end;
enddata;
run;quit;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 13 Jan 2025 07:35:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955874#M373293</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-13T07:35:27Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955943#M373302</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;If you have PROC DS2, this way is the most fast/efficient.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
 do i=1 to 1e6;
  age=rand('integer',1,100000);output;
 end;
run;

proc ds2;
data want(overwrite=yes);
method run();
 set have  ;
 if age=16;
end;
enddata;
run;quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I wonder if this is superior to WHERE techniques when the observations are fat, making for a long program data vector.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I believe the IF statement for regular SAS (and I presume also for IF within DS2) requires that the PDV be fully populated prior to filtering.&amp;nbsp; I don't think the WHERE statement supports that activity (otherwise how could WHERE be supported by various 3rd-party data engines?).&amp;nbsp;&amp;nbsp;When&amp;nbsp;there is just a single variable, use of IF won't be much of a burden.&amp;nbsp; But if there are (say) 300 variables (with lots of character vars of diverse lengths) I imagine there will be increasing differences between IF and WHERE.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2025 17:24:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955943#M373302</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-01-13T17:24:31Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955946#M373304</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thing is that DS2 doesn't support WHERE filtering...&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1    proc ds2;
2    data want(overwrite=yes);
3    method run();
4     set have;
5     where age=16;
6    end;
7    enddata;
8    run;
8  !     quit;
ERROR: Compilation error.
ERROR: Missing END statement for the method run.
ERROR: Parse encountered WHERE when expecting end of input.
ERROR: Line 5: Parse failed:  &amp;gt;&amp;gt;&amp;gt; where &amp;lt;&amp;lt;&amp;lt;  age=16;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE DS2 used (Total process time):
      real time           0.01 seconds
      user cpu time       0.01 seconds
      system cpu time     0.00 seconds
      memory              3864.96k
      OS Memory           21752.00k
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;We have to use {SQL subqueries}&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1    proc ds2;
2    data want(overwrite=yes);
3    method run();
4     set { select * from have1 where age=16 };
5    end;
6    enddata;
7    run;
7  !     quit;
NOTE: Execution succeeded. 9 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):
      real time           0.03 seconds
      user cpu time       0.03 seconds
      system cpu time     0.00 seconds
      memory              4628.15k
      OS Memory           22512.00k
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I did small experiment, on 10M observations:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1 have2 have3 have4;
call streaminit(123);
 do i=1 to 1e7;
  age=rand('integer',1,100000);output;
 end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Data Step WHERE seems to be winning (at least at my laptop)&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;10
11
12   proc ds2;
13   data want1(overwrite=yes);
14   method run();
15    set have1 ;
16    if age=16;
17   end;
18   enddata;
19   run;
19 !     quit;
NOTE: Execution succeeded. 96 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):
      real time           0.13 seconds
      user cpu time       0.17 seconds
      system cpu time     0.00 seconds
      
      real time           0.12 seconds
      user cpu time       0.17 seconds
      system cpu time     0.03 seconds
      
      real time           0.13 seconds
      user cpu time       0.12 seconds
      system cpu time     0.04 seconds
      
      real time           0.13 seconds
      user cpu time       0.14 seconds
      system cpu time     0.03 seconds
      
      real time           0.12 seconds
      user cpu time       0.12 seconds
      system cpu time     0.07 seconds
      
      memory              4648.96k
      OS Memory           22256.00k


20
21   proc ds2;
22   data want2(overwrite=yes);
23   method run();
24    set { select * from have2 where age=16 };
25   end;
26   enddata;
27   run;
27 !     quit;
NOTE: Execution succeeded. 96 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):
      real time           0.13 seconds
      user cpu time       0.11 seconds
      system cpu time     0.03 seconds
      
      real time           0.12 seconds
      user cpu time       0.11 seconds
      system cpu time     0.01 seconds
      
      real time           0.14 seconds
      user cpu time       0.12 seconds
      system cpu time     0.01 seconds
      
      real time           0.12 seconds
      user cpu time       0.09 seconds
      system cpu time     0.03 seconds
      
      real time           0.12 seconds
      user cpu time       0.11 seconds
      system cpu time     0.01 seconds
      
      memory              4628.23k
      OS Memory           22512.00k



28
29   data want3;
30    set have3 ;
31    if age=16;
32   run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE3.
NOTE: The data set WORK.WANT3 has 96 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.16 seconds
      user cpu time       0.12 seconds
      system cpu time     0.04 seconds
      
      real time           0.16 seconds
      user cpu time       0.17 seconds
      system cpu time     0.00 seconds
      
      real time           0.15 seconds
      user cpu time       0.14 seconds
      system cpu time     0.01 seconds
      
      real time           0.15 seconds
      user cpu time       0.14 seconds
      system cpu time     0.00 seconds
      
      real time           0.14 seconds
      user cpu time       0.10 seconds
      system cpu time     0.03 seconds
      
      memory              624.21k
      OS Memory           20464.00k



33
34   data want4;
35    set have4 ;
36    where age=16;
37   run;

NOTE: There were 96 observations read from the data set WORK.HAVE4.
      WHERE age=16;
NOTE: The data set WORK.WANT4 has 96 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.10 seconds
      user cpu time       0.09 seconds
      system cpu time     0.01 seconds
      
      real time           0.10 seconds
      user cpu time       0.09 seconds
      system cpu time     0.00 seconds
      
      real time           0.10 seconds
      user cpu time       0.09 seconds
      system cpu time     0.01 seconds
      
      real time           0.10 seconds
      user cpu time       0.07 seconds
      system cpu time     0.03 seconds
      
      real time           0.10 seconds
      user cpu time       0.07 seconds
      system cpu time     0.01 seconds
      
      memory              641.87k
      OS Memory           20464.00k&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Jan 2025 17:58:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955946#M373304</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2025-01-13T17:58:07Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955983#M373315</link>
      <description>&lt;P&gt;PROC DS2 is multi-session , but data step is single session, so I think DS2 could save a lot of time . and DS2 is recent PROC and could be optimal for data processsing.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I understand what you mean. But IF is indeed more efficient than WHERE when table is big. Here is I tested code:&lt;/P&gt;
&lt;PRE&gt;148  data have;
149  array x{20} (20*1);
150   do i=1 to 1e7;
151    age=rand('integer',1,100000);output;
152   end;
153  run;

NOTE: The data set WORK.HAVE has 10000000 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time           4.26 seconds
      cpu time            0.50 seconds


154
155
156  data w1;
157   set have(where=(age=16));
158  run;

NOTE: There were 105 observations read from the data set WORK.HAVE.
      WHERE age=16;
NOTE: The data set WORK.W1 has 105 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time           &lt;STRONG&gt;4.32 seconds&lt;/STRONG&gt;
      cpu time            0.28 seconds


159
160
161  data w2;
162   set have;
163   if age=16;
164  run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.W2 has 105 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time          &lt;STRONG&gt; 0.75 seconds&lt;/STRONG&gt;
      cpu time            0.71 seconds




&lt;/PRE&gt;</description>
      <pubDate>Tue, 14 Jan 2025 01:48:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955983#M373315</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-14T01:48:02Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955984#M373316</link>
      <description>&lt;P&gt;So did I . I think DS2 is most recent PROC and must have some advantages&amp;nbsp; and maybe it is under SAS/Viya multi-session&amp;nbsp; environment .&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;78   data have;
79   array x{20} (20*1);
80    do i=1 to 1e7;
81     age=rand('integer',1,100000);output;
82    end;
83   run;

NOTE: The data set WORK.HAVE has 10000000 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time           4.43 seconds
      cpu time            0.43 seconds


84
85
86
87   proc ds2;
88   data want(overwrite=yes);
89   method run();
90    set have  ;
91    if age=16;
92   end;
93   enddata;
94   run;
94 !     quit;
NOTE: Execution succeeded. 84 rows affected.

NOTE: PROCEDURE DS2 used (Total process time):
      real time          &lt;STRONG&gt; 4.42 seconds&lt;/STRONG&gt;
      cpu time            0.48 seconds


95
96
97
98   data w;
99    set have;
100   if age=16;
101  run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.W has 84 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time          &lt;STRONG&gt; 1.04 seconds&lt;/STRONG&gt;
      cpu time            1.03 seconds







&lt;/PRE&gt;</description>
      <pubDate>Tue, 14 Jan 2025 01:39:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/955984#M373316</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-14T01:39:28Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956003#M373319</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You ran the IF filter after you ran the WHERE filter, against the same dataset.&amp;nbsp; I think the second filter benefited from caching generated by the first.&amp;nbsp; When I ran the WHERE after the IF, the timing results were equal.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So I ran your test (with 20 extra character veriables) using a different, but identical dataset for each filter (similar to&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/35763"&gt;@yabwon&lt;/a&gt;).&amp;nbsp; And the filter tests were DATA _NULL_ steps, which allows a narrower focus on the input filters.&amp;nbsp; IF and WHERE turned out about the same (which I admit I wasn't expecting).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;1    data have1 have2;
2    array x{20} (20*1); array chr{20} $30 (20*' ');
3     do i=1 to 1e7;
4      age=rand('integer',1,100000);
5      output;
6     end;
7    run;

NOTE: The data set WORK.HAVE1 has 10000000 observations and 42 variables.
NOTE: The data set WORK.HAVE2 has 10000000 observations and 42 variables.
NOTE: DATA statement used (Total process time):
      real time           27.78 seconds
      cpu time            0.62 seconds


8
9    data _null_;
10    set have1(where=(age=16));
11   run;

NOTE: There were 93 observations read from the data set WORK.HAVE1.
      WHERE age=16;
NOTE: DATA statement used (Total process time):
      real time           12.01 seconds
      cpu time            0.07 seconds


12   data _null_;
13    set have2;
14    if age=16;
15   run;

NOTE: There were 10000000 observations read from the data set WORK.HAVE2.
NOTE: DATA statement used (Total process time):
      real time           13.95 seconds
      cpu time            0.04 seconds

&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Question:&amp;nbsp; to get the multi-threading advantage of PROC DS2 in a DATA "step" wouldn't it be necessary to define threads?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 04:40:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956003#M373319</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2025-01-14T04:40:58Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956007#M373322</link>
      <description>&lt;P&gt;OK .I am running the following two code separatedly ,and also got the same result.&lt;/P&gt;
&lt;PRE&gt;1
2      data have;
3       array x{20} (20*1);
4       call streaminit(123);
5        do i=1 to 1e7;
6         age=rand('integer',1,100000);output;
7        end;
8       run;

NOTE: 数据集 WORK.HAVE 有 10000000 个观测和 22 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间          4.04 秒
      CPU 时间          0.17 秒


9
10   data w1;
11    set have(where=(age=16));
12   run;

NOTE: 从数据集 WORK.HAVE. 读取了 96 个观测
      WHERE age=16;
NOTE: 数据集 WORK.W1 有 96 个观测和 22 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间         &lt;STRONG&gt; 0.65 秒&lt;/STRONG&gt;
      CPU 时间          0.56 秒

&lt;/PRE&gt;
&lt;PRE&gt;1
2      data have;
3       array x{20} (20*1);
4       call streaminit(123);
5        do i=1 to 1e7;
6         age=rand('integer',1,100000);output;
7        end;
8       run;

NOTE: 数据集 WORK.HAVE 有 10000000 个观测和 22 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间          3.88 秒
      CPU 时间          0.18 秒


9
10   data w1;
11    set have;
12    if age=16;
13   run;

NOTE: 从数据集 WORK.HAVE. 读取了 10000000 个观测
NOTE: 数据集 WORK.W1 有 96 个观测和 22 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间          &lt;STRONG&gt;0.78 秒&lt;/STRONG&gt;
      CPU 时间          0.73 秒
&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;"to get the multi-threading advantage of PROC DS2 in a DATA "step" wouldn't it be necessary to define threads?"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yes. You are right.But still got the same result.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt; I think to take DS2 advantage&amp;nbsp; you also need spread/split your big table into many nodes of sas server , just like SPDE engine or SPD Server.&lt;/SPAN&gt;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;
  data have;
   array x{20} (20*1);
   call streaminit(123);
    do i=1 to 1e7;
     age=rand('integer',1,100000);output;
    end;
   run;






proc ds2;
thread th/overwrite=yes;
method run();
 set have;
 if age=16;
end;
endthread;
run;
quit;
proc ds2;
data w2/overwrite=yes;
declare thread th th1;
method run();
 set from th1 threads=8;
end;
enddata;
run;
quit;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;1
2      data have;
3       array x{20} (20*1);
4       call streaminit(123);
5        do i=1 to 1e7;
6         age=rand('integer',1,100000);output;
7        end;
8       run;

NOTE: 数据集 WORK.HAVE 有 10000000 个观测和 22 个变量。
NOTE: “DATA 语句”所用时间（总处理时间）:
      实际时间          4.49 秒
      CPU 时间          0.29 秒


9
10
11
12
13
14
15   proc ds2;
NOTE: 正在写入 HTML Body（主体）文件: sashtml.htm
16   thread th/overwrite=yes;
17   method run();
18    set have;
19    if age=16;
20   end;
21   endthread;
22   run;
NOTE: 已在数据集 work.th 中创建 thread th。
NOTE: Execution succeeded. No rows affected.
23   quit;

NOTE: “PROCEDURE DS2”所用时间（总处理时间）:
      实际时间          0.41 秒
      CPU 时间          0.12 秒


24   proc ds2;
25   data w2/overwrite=yes;
26   declare thread th th1;
27   method run();
28    set from th1 threads=8;
29   end;
30   enddata;
31   run;
NOTE: Execution succeeded. 96 rows affected.
32   quit;

NOTE: “PROCEDURE DS2”所用时间（总处理时间）:
      实际时间         &lt;STRONG&gt; 0.84 秒&lt;/STRONG&gt;
      CPU 时间          0.84 秒


&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 06:40:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956007#M373322</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-14T06:40:40Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956044#M373344</link>
      <description>&lt;P&gt;DS2 multi-threading is good for CPU intensive work, but data sourcing is also single threaded.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is pretty nice article:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.lexjansen.com/pharmasug/2019/AD/PharmaSUG-2019-AD-228.pdf" target="_self"&gt;&lt;STRONG&gt;https://www.lexjansen.com/pharmasug/2019/AD/PharmaSUG-2019-AD-228.pdf&lt;/STRONG&gt;&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;comparing DS2 and Data Step processing by Troy Martin Hughes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I think that&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13728"&gt;@SASJedi&lt;/a&gt;&amp;nbsp;could give us a hand here &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jan 2025 12:32:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956044#M373344</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2025-01-14T12:32:23Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956167#M373392</link>
      <description>&lt;P&gt;Should it be&amp;nbsp; quicker? why?&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2025 08:59:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956167#M373392</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-01-15T08:59:13Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956226#M373413</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;&amp;nbsp;I noticed that when you ran the programs in separate SAS sessions, the WHERE performed better than the subsetting IF. This is what I would have expected. WHERE processing rejects rows before loading them into the PDV. The subsetting IF rejects rows only after they are loaded into the PDV, so should always be slower than WHERE.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;From my own test on PC SAS:&lt;/P&gt;
&lt;PRE&gt;49         data w2;
50         	set have;
51         	if age=16;
52         run;

&lt;STRONG&gt;NOTE: There were 10000000 observations read from the data set WORK.HAVE.&lt;/STRONG&gt;
NOTE: The data set WORK.W2 has 110 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time          &lt;FONT color="#FF0000"&gt;&lt;STRONG&gt; 3.07 seconds&lt;/STRONG&gt;&lt;/FONT&gt;
      cpu time            1.46 seconds
_________________________________________________________________________________&lt;BR /&gt;49         data w1;
50         	set have(where=(age=16));
51         run;

&lt;STRONG&gt;NOTE: There were 110 observations read from the data set WORK.HAVE.
      WHERE age=16;&lt;/STRONG&gt;
NOTE: The data set WORK.W1 has 110 observations and 22 variables.
NOTE: DATA statement used (Total process time):
      real time          &lt;STRONG&gt;&lt;FONT color="#FF0000"&gt; 1.92 seconds&lt;/FONT&gt;&lt;/STRONG&gt;
      cpu time            0.67 seconds&lt;/PRE&gt;
&lt;P&gt;When DS2 is running in Base SAS, all read-write operations occur on a single thread (to ensure proper distribution of the data to the compute threads), but there can be multiple compute threads.&amp;nbsp; This means that, in base SAS, you will only see a performance gain in DS2 if your process is CPU bound. YOu can tell that a process is CPU bound if the real time and CPU time are about the same. A look our example log shows that the real time is much longer than the CPU time - so this process is I/O bound, not CPU bound, and will not benefit from multi-threading in base SAS. Now, if you have access to a MPP environment that can run DS2 (like CAS, or one of the databases that support the in-database code accelerator) then you can get both parallel read-wrote and parallel compute, and DS2 should give you a nice boost over single-threaded DATA steps.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;DS2 does not include a WHERE statement because it supports reading directly from a FedSQL query on the SET statement. For example:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc ds2;
	thread th/overwrite=yes;
		method run();
			set {select * from have where age=16};
		end;
	endthread;
run;

	data w2/overwrite=yes;
		declare thread th th1;
		method run();
			set from th1 threads=8;
		end;
	enddata;
run;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;When I ran this, it took just about the same amount of time as the DATA step with the WHERE statement:&lt;/P&gt;
&lt;PRE&gt;NOTE: PROCEDURE DS2 used (Total process time):
      real time          &lt;FONT color="#FF0000"&gt;&lt;STRONG&gt; 1.79 seconds&lt;/STRONG&gt;&lt;/FONT&gt;
      cpu time            0.84 seconds&lt;/PRE&gt;
&lt;P&gt;You can see tips like this in my &lt;A href="https://www.youtube.com/watch?v=he9ecikOPCE" target="_blank"&gt;SAS Tutorial | 5 Ways to Make Your SAS Code Run Faster&lt;/A&gt;&amp;nbsp;- it's short and sweet and shows examples.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Hope this helps.&lt;/P&gt;</description>
      <pubDate>Wed, 15 Jan 2025 17:45:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956226#M373413</guid>
      <dc:creator>SASJedi</dc:creator>
      <dc:date>2025-01-15T17:45:39Z</dc:date>
    </item>
    <item>
      <title>Re: Select one customer</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956266#M373432</link>
      <description>Jedi,&lt;BR /&gt;&lt;BR /&gt;"when you ran the programs in separate SAS sessions, the WHERE performed better than the subsetting IF. "&lt;BR /&gt;Maybe you are right. But sometimes I noticed that IF is a little faster than WHERE , I don't know why ,maybe the environment around SAS installed is a key reason.&lt;BR /&gt;</description>
      <pubDate>Thu, 16 Jan 2025 01:15:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Select-one-customer/m-p/956266#M373432</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2025-01-16T01:15:29Z</dc:date>
    </item>
  </channel>
</rss>

