<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS keeps running after it has read in all of the observations that match the subset condition in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928525#M365317</link>
    <description>I did include the run; command.</description>
    <pubDate>Wed, 15 May 2024 18:03:06 GMT</pubDate>
    <dc:creator>eiger</dc:creator>
    <dc:date>2024-05-15T18:03:06Z</dc:date>
    <item>
      <title>SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928517#M365311</link>
      <description>&lt;P&gt;When I subset a dataset in a DATA step, SAS will continue to run for much longer than expected, so long in fact that I have not seen it finish running. However, when I break the run and cancel the submitted statements, the log indicates that 1,271 observations were read, which is number of observations that I expect to have in the subset. &lt;STRONG&gt;Why is it that SAS keeps running when all of the observations that match the WHERE condition have been read?&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In the DATA step I use a WHERE statement to subset for observations where the character variable SUB = '123'. The dataset is large (1.3M+ obs.), but as I mentioned, the resulting data set "filtered_items" should only have 1,271 observations.&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=""&gt;libname corpxin "\\filepath\folder";

data filtered_items;
	set corpxin.items_202001;
	where SUB = '123';
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 15 May 2024 17:26:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928517#M365311</guid>
      <dc:creator>eiger</dc:creator>
      <dc:date>2024-05-15T17:26:48Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928521#M365314</link>
      <description>&lt;P&gt;Where is the libref pointing?&lt;/P&gt;
&lt;P&gt;Is that an actual SAS dataset? Or is the libref pointing to some external database?&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Is the member an actual dataset (or database table) or is it a view?&lt;/P&gt;
&lt;P&gt;Is your where clause really that simple equality test? Or is it something more complex?&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Note that if the libref is pointing to a remote database and the where clause is something that cannot be passed to the remote database then the delay is probably from SAS having to copy ALL of the observations to the SAS server in order to apply the filter.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 17:38:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928521#M365314</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-05-15T17:38:11Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928523#M365315</link>
      <description>&lt;P&gt;The library is a folder on a remote drive. The file corpxin.items_202001 is a .sas7bdat dataset.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Yes, the where clause is just a simple equality test.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 17:47:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928523#M365315</guid>
      <dc:creator>eiger</dc:creator>
      <dc:date>2024-05-15T17:47:33Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928524#M365316</link>
      <description>&lt;BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;Why is it that SAS keeps running when all of the observations that match the WHERE condition have been read?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I will guess that you did not submit the &lt;FONT face="courier new,courier"&gt;run;&lt;/FONT&gt; command.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Or guess #2 — SAS is still crunching through the 1.3 million records. SAS can find the 1271 of interest and still have 1.2 million records to search through; it doesn't know there are no more matches.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 18:00:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928524#M365316</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-05-15T18:00:41Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928525#M365317</link>
      <description>I did include the run; command.</description>
      <pubDate>Wed, 15 May 2024 18:03:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928525#M365317</guid>
      <dc:creator>eiger</dc:creator>
      <dc:date>2024-05-15T18:03:06Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928527#M365319</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Or guess #2 — SAS is still crunching through the 1.3 million records. SAS can find the 1271 of interest and still have 1.2 million records to search through; it doesn't know there are no more matches.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 18:17:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928527#M365319</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2024-05-15T18:17:11Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928535#M365322</link>
      <description>&lt;P&gt;Local SAS probably doesn't know that the library is on a remote drive.&amp;nbsp; That is masked by the operating system, or the remote-access file system.&amp;nbsp; &amp;nbsp;So the data engine (and the "where" filter) is probably running on the local system, meaning, as&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/159"&gt;@Tom&lt;/a&gt;&amp;nbsp;suggested, all the data is being transported to the client (where SAS is running) system prior to applying the filter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Since the data file is a&amp;nbsp;&lt;U&gt;&lt;STRONG&gt;.sas7bat&lt;/STRONG&gt;&lt;/U&gt; file, you would need a way to have SAS to run a data engine on the remote system to apply the filter prior to transport to the local system.&amp;nbsp; I have done this with sas/connect, but that requires a SAS license on both the client and server systems.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;In the absence of the above, you can at least measure your progress with something like this, which reports elapsed time for the 1st, 101st, 201st, etc. obs that passes the where filter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
  t0=put(time(),time8.0);
  call symput("T0",t0);
 run;
%put &amp;amp;=t0;

data filtered_items;
	set corpxin.items_202001;
	where SUB = '123';

  if mod(_n_,100)=1 then do;  
    t1=time();
    elapsed=t1-"&amp;amp;t0"t;
    put _n_=comma6.0  t1=time8.0  elapsed=time8.0;
  end;
  drop t1 elapsed;
run;
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 15 May 2024 18:43:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928535#M365322</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-05-15T18:43:35Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928538#M365323</link>
      <description>&lt;P&gt;Also, if the OP is certain that only 1,271 observations satisfy the filter, the program can be made to stop after outputting those 1,271, even if there remains a large part of the data file yet to be submitted to the filter.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data filtered_items;
	set corpxin.items_202001;
	where SUB = '123';

  output;
  if _n_=1271 then stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 15 May 2024 18:57:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928538#M365323</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2024-05-15T18:57:38Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928539#M365324</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/465981"&gt;@eiger&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;The library is a folder on a remote drive. The file corpxin.items_202001 is a .sas7bdat dataset.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, the where clause is just a simple equality test.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;SAS will have to read the WHOLE file to know whether or not there are any more observations that meet the where condition.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could probably improve the performance of that query in a couple of different ways.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could add an index on that variable.&amp;nbsp; Then SAS could read the index file (which should be much smaller) and find out which parts of the dataset need to be read.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If the dataset was sorted by that variable (and the values you want are near the start) then you could use an IF statement instead of a where statement and add a test to stop when you have passed the values of interest.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;libname corpxin "\\filepath\folder";

data filtered_items;
  set corpxin.items_202001;
  by sub;
  if SUB = '123' then output;
  if sub &amp;gt; '123' then stop;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 18:58:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928539#M365324</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2024-05-15T18:58:54Z</dc:date>
    </item>
    <item>
      <title>Re: SAS keeps running after it has read in all of the observations that match the subset condition</title>
      <link>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928543#M365325</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/465981"&gt;@eiger&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;The library is a folder on a remote drive. The file corpxin.items_202001 is a .sas7bdat dataset.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, the where clause is just a simple equality test.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I am going to say a combination of the data step having to read every observation coupled with your network bandwidth or traffic.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Several years ago I copied network data sets to my local drive because the amount of time to access a data set with 5,000 observations was excessive. As in running proc freq on a couple of variables could take 15 minutes because of the amount of network traffic using that drive. Running with the local copy was a few seconds. I can image that if your data set with a million plus observations was on that drive with the network in effect that it could take hours to complete.&lt;/P&gt;</description>
      <pubDate>Wed, 15 May 2024 19:08:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/SAS-keeps-running-after-it-has-read-in-all-of-the-observations/m-p/928543#M365325</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2024-05-15T19:08:00Z</dc:date>
    </item>
  </channel>
</rss>

