<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Filtering a dataset on an indexed date field takes long in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/862018#M340476</link>
    <description>&lt;P&gt;Writing 2.7 million observations simply takes longer than 1 or 37 observations (both just need one dataset page, so the same physical amount of data is written).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 02 Mar 2023 19:52:19 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2023-03-02T19:52:19Z</dc:date>
    <item>
      <title>Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861910#M340428</link>
      <description>&lt;P&gt;I have two very large SAS datasets that have indexes on the most commonly used columns used in filters.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dataset1&lt;/P&gt;
&lt;P&gt;99,209,789 records&lt;BR /&gt;850 columns&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Dataset2&lt;/P&gt;
&lt;P&gt;758,618,653&amp;nbsp;records&lt;BR /&gt;375 columns&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I select records based on a text formatted indexed column, the results are returned in under 1 second.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I select records based on a date formatted indexed column, the results are returned in about 1 minute 22 seconds.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When I select records based on a text formatted and the date formatted indexed column, the results are returned in under 1 second.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Does anyone know why the selection on the date column, that has been indexed, takes much longer (on its own) than when combined with a character indexed column or just a character indexed column.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Could it be the format of the date in the "where" clause?&lt;/P&gt;
&lt;P&gt;Data credit_sasma;&lt;BR /&gt;set SASMAHIS.SASMA_ACCOUNT_HISTORY (where=(Obs_Date = '12JAN2023'd));&lt;BR /&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Date format in the SAS Datasets:&lt;/P&gt;
&lt;DIV&gt;
&lt;DIV&gt;
&lt;DIV id="IDX2"&gt;
&lt;TABLE class="table" aria-label="Variables"&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TH class="r rowheader" scope="row"&gt;3&lt;/TH&gt;
&lt;TD class="data"&gt;OBS_DATE&lt;/TD&gt;
&lt;TD class="data"&gt;Num&lt;/TD&gt;
&lt;TD class="r data"&gt;8&lt;/TD&gt;
&lt;TD class="data"&gt;DATE9.&lt;/TD&gt;
&lt;TD class="data"&gt;DATE9.&lt;/TD&gt;
&lt;TD class="data"&gt;OBS_DATE&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;See attached document for the code and the log.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 02 Mar 2023 08:27:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861910#M340428</guid>
      <dc:creator>DarrylLawrence</dc:creator>
      <dc:date>2023-03-02T08:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861913#M340430</link>
      <description>&lt;P&gt;That is not so strange. The queries using the text column returns very few rows. The query with the date column alone returns 2.7 million records. That has to take more time.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 09:28:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861913#M340430</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2023-03-02T09:28:21Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861914#M340431</link>
      <description>&lt;P&gt;Could you run your code with the following options on:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;options fullstimer msglevel=I;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;so we could see details of processing times and information which indexes were used by SAS to process data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To force SAS to use particular index you could try the&amp;nbsp;&lt;SPAN&gt;IDXNAME= data set option.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;[EDIT:] One more thing, indexes are executed against raw data, so format won't affect processing.&lt;/P&gt;
&lt;P&gt;You probably saw this article by Billy Clifford about SAS Indexes:&amp;nbsp;&lt;A href="https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/008-30.pdf" target="_blank"&gt;https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/008-30.pdf&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Bart&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 10:05:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861914#M340431</guid>
      <dc:creator>yabwon</dc:creator>
      <dc:date>2023-03-02T10:05:24Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861926#M340436</link>
      <description>&lt;P&gt;I added the index option and the options -&amp;nbsp;options fullstimer msglevel=I;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Log: No difference in the runtime&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;28 Data credit_sasma;&lt;BR /&gt;29 set SASMAHIS.SASMA_ACCOUNT_HISTORY (idxname=obs_date where=(Obs_Date = '12JAN2023'd));&lt;BR /&gt;INFO: Index OBS_DATE selected for WHERE clause optimization.&lt;BR /&gt;30 run;&lt;/P&gt;
&lt;P&gt;NOTE: There were 2789161 observations read from the data set SASMAHIS.SASMA_ACCOUNT_HISTORY.&lt;BR /&gt;WHERE Obs_Date='12JAN2023'D;&lt;BR /&gt;NOTE: The data set WORK.CREDIT_SASMA has 2789161 observations and 850 variables.&lt;BR /&gt;NOTE: Compressing data set WORK.CREDIT_SASMA decreased size by 67.65 percent. &lt;BR /&gt;Compressed is 60148 pages; un-compressed would require 185945 pages.&lt;BR /&gt;NOTE: DATA statement used (Total process time):&lt;BR /&gt;real time 1:20.43&lt;BR /&gt;user cpu time 1:12.15&lt;BR /&gt;system cpu time 7.75 seconds&lt;BR /&gt;memory 5980.75k&lt;BR /&gt;OS Memory 31400.00k&lt;BR /&gt;Timestamp 03/02/2023 12:16:33 PM&lt;BR /&gt;Step Count 17 Switch Count 133&lt;BR /&gt;Page Faults 2&lt;BR /&gt;Page Reclaims 1127&lt;BR /&gt;Page Swaps 0&lt;BR /&gt;Voluntary Context Switches 412&lt;BR /&gt;Involuntary Context Switches 535&lt;BR /&gt;Block Input Operations 12532969&lt;BR /&gt;Block Output Operations 15409176&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 12:33:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861926#M340436</guid>
      <dc:creator>DarrylLawrence</dc:creator>
      <dc:date>2023-03-02T12:33:36Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861981#M340451</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/17486"&gt;@DarrylLawrence&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It would be interesting to se the result of rerunning with a data _null_ step instead, so no time is used for compressing and writing output, just to see if the where clause is the real problem here.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 16:58:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861981#M340451</guid>
      <dc:creator>ErikLund_Jensen</dc:creator>
      <dc:date>2023-03-02T16:58:23Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861983#M340452</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/12887"&gt;@ErikLund_Jensen&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/17486"&gt;@DarrylLawrence&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It would be interesting to se the result of rerunning with a data _null_ step instead, so no time is used for compressing and writing output, just to see if the where clause is the real problem here.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Or maybe just KEEPing 2 or 3 variables instead of 850.&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 17:18:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/861983#M340452</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-03-02T17:18:41Z</dc:date>
    </item>
    <item>
      <title>Re: Filtering a dataset on an indexed date field takes long</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/862018#M340476</link>
      <description>&lt;P&gt;Writing 2.7 million observations simply takes longer than 1 or 37 observations (both just need one dataset page, so the same physical amount of data is written).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 02 Mar 2023 19:52:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Filtering-a-dataset-on-an-indexed-date-field-takes-long/m-p/862018#M340476</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2023-03-02T19:52:19Z</dc:date>
    </item>
  </channel>
</rss>

