<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SET data set-Run speed in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959459#M374349</link>
    <description>&lt;P&gt;Maxim 4: Try It.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But from experience I prefer option 2, either with the WHERE= dataset option or a WHERE statement.&lt;/P&gt;
&lt;P&gt;Unless you read from an external database, where you might want (depending on data structure) to use explicit passthrough first.&lt;/P&gt;</description>
    <pubDate>Tue, 18 Feb 2025 09:57:43 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2025-02-18T09:57:43Z</dc:date>
    <item>
      <title>SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959436#M374331</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;
&lt;P&gt;Lets say I have a big data set (25 million&amp;nbsp; rows and 30 columns) in a permanent library and I want to select specific rows by criteria.&lt;/P&gt;
&lt;P&gt;I just used sashelp.class for example.&lt;/P&gt;
&lt;P&gt;From these 4 ways which way is best (Highest run speed) and which is worst(lowest run speed)?&lt;/P&gt;
&lt;P&gt;How would you rank these 4 methods by speed?(best to worst)?&lt;/P&gt;
&lt;P&gt;Is there any other way to do it?&lt;/P&gt;
&lt;P&gt;If I add Index then the 4 ways run more quickly?&lt;/P&gt;
&lt;P&gt;Note- in real life data set is much much bigger and sashelp.class is only an example&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
create table t1 as
select *
from  sashelp.class
where sex='F'
;
quit;

Data t2;
set sashelp.class(Where=( sex='F'));
Run;

Data t3(Where=(sex='F'));
set sashelp.class;
Run;

proc append data=sashelp.class(Where=( sex='F')) base=t4;quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 06:06:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959436#M374331</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-02-18T06:06:34Z</dc:date>
    </item>
    <item>
      <title>Re: SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959446#M374338</link>
      <description>&lt;P&gt;You could run a test script in your environment to get an idea.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options fullstimer compress=no;

data work.have;
  array vars{30} $100 (30*'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA');
  do i=1 to 1000000;
    if rand('integer',1,10)=1 then sex='F';
    else sex='M';
    output;
  end;
  stop;
run;

%macro test(compress=no);
  options compress=&amp;amp;compress;
  proc datasets lib=work nowarn nolist;
    delete t1 t2 t3 t4;
  quit;
  
  proc sql;
    create table t1 as select * from work.have where sex='F';
  quit;
  
  Data t2;
    set work.have(Where=(sex='F'));
  Run;
  
  Data t3(Where=(sex='F'));
    set work.have;
  Run;
  
  proc append data=work.have(Where=(sex='F')) base=t4;
  quit;
%mend;
%test();
%test();
%test(compress=yes);
%test(compress=yes);
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;I would expect that you'll get for all 4 methods comparable results.&amp;nbsp;&lt;BR /&gt;Your ways 1, 2 and 4 subset the data when reading from source, your way 3 only subsets the data when writing to target.&lt;/P&gt;
&lt;P&gt;Way 3 will process all rows from source. If you don't need them (i.e. for aggregations before you write to target) then this way 3 is sub-optimal and depending on what you do in the data step would consume more time than the other 3 methods.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 07:29:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959446#M374338</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2025-02-18T07:29:18Z</dc:date>
    </item>
    <item>
      <title>Re: SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959453#M374344</link>
      <description>&lt;P&gt;I expect that WAY4 (t4) be the best, Am I right?&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 08:39:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959453#M374344</guid>
      <dc:creator>Ronein</dc:creator>
      <dc:date>2025-02-18T08:39:51Z</dc:date>
    </item>
    <item>
      <title>Re: SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959456#M374346</link>
      <description>&lt;P&gt;is the data set sorted by that variable you want to filter out on?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bart&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 08:50:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959456#M374346</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2025-02-18T08:50:22Z</dc:date>
    </item>
    <item>
      <title>Re: SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959457#M374347</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159549"&gt;@Ronein&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I expect that WAY4 (t4) be the best, Am I right?&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I initially thought the same because of the way Proc Append writes data. But then running the performance test script in my environment didn't confirm the theory. May be that's due to the fact that Proc Append needs to read the source data by row for sub-setting and though can't just copy blocks from source to target.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Run the test script in your environment and compare the run times.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 08:56:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959457#M374347</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2025-02-18T08:56:36Z</dc:date>
    </item>
    <item>
      <title>Re: SET data set-Run speed</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959459#M374349</link>
      <description>&lt;P&gt;Maxim 4: Try It.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But from experience I prefer option 2, either with the WHERE= dataset option or a WHERE statement.&lt;/P&gt;
&lt;P&gt;Unless you read from an external database, where you might want (depending on data structure) to use explicit passthrough first.&lt;/P&gt;</description>
      <pubDate>Tue, 18 Feb 2025 09:57:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SET-data-set-Run-speed/m-p/959459#M374349</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2025-02-18T09:57:43Z</dc:date>
    </item>
  </channel>
</rss>

