<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to improve performance in Data step where clause in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693391#M211404</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am currently using SAS Data step.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data test1; /* loading approx 17 lakhs records */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;Set source (keep=var1 var2 var3); /* source data set has&amp;nbsp;322596778 obs */&lt;/P&gt;&lt;P&gt;where DATEPART(START_from_dtm)="31JAN2020"D &amp;lt;=datepart(START_to_dtm);&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;The above data step is running for 25 minutes. Is there any solution to execute the query faster.&lt;/P&gt;&lt;P&gt;START_from_dtm has index on it. So tried by keeping only&amp;nbsp;START_from_dtm column but same time.&lt;/P&gt;&lt;P&gt;I am using the same query multiple times and the execution is increasing.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 22 Oct 2020 05:25:33 GMT</pubDate>
    <dc:creator>Learn_SAS1</dc:creator>
    <dc:date>2020-10-22T05:25:33Z</dc:date>
    <item>
      <title>How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693391#M211404</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I am currently using SAS Data step.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data test1; /* loading approx 17 lakhs records */&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp;Set source (keep=var1 var2 var3); /* source data set has&amp;nbsp;322596778 obs */&lt;/P&gt;&lt;P&gt;where DATEPART(START_from_dtm)="31JAN2020"D &amp;lt;=datepart(START_to_dtm);&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;The above data step is running for 25 minutes. Is there any solution to execute the query faster.&lt;/P&gt;&lt;P&gt;START_from_dtm has index on it. So tried by keeping only&amp;nbsp;START_from_dtm column but same time.&lt;/P&gt;&lt;P&gt;I am using the same query multiple times and the execution is increasing.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 05:25:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693391#M211404</guid>
      <dc:creator>Learn_SAS1</dc:creator>
      <dc:date>2020-10-22T05:25:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693392#M211405</link>
      <description>&lt;P&gt;Do your datetimes have a time, or is the time just 00:00:00?&lt;/P&gt;
&lt;P&gt;Is your source table indexed or sorted?&lt;/P&gt;
&lt;P&gt;How many observations do you have, and how many different dates?&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 05:31:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693392#M211405</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-22T05:31:34Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693393#M211406</link>
      <description>&lt;P&gt;effective_from_dtm has values like DDMONYYYY:00:00:00.000.&lt;/P&gt;&lt;P&gt;As I have observed most of the values has&amp;nbsp; 00 as time.&lt;/P&gt;&lt;P&gt;Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can't create Index on that table as it is owned by other teams.&amp;nbsp;&lt;/P&gt;&lt;P&gt;That data set is not sorted.&lt;/P&gt;&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 05:55:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693393#M211406</guid>
      <dc:creator>Learn_SAS1</dc:creator>
      <dc:date>2020-10-22T05:55:11Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693395#M211408</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/349066"&gt;@Learn_SAS1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;I am currently using SAS Data step.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data test1; /* loading approx 17 lakhs records */&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;Set source (&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;keep&lt;/FONT&gt;&lt;/STRONG&gt;=var1 var2 var3); /* source data set has&amp;nbsp;322.596.778 obs */&lt;/P&gt;
&lt;P&gt;where DATEPART(&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;START_from_dtm&lt;/FONT&gt;&lt;/STRONG&gt;)&lt;FONT color="#3366FF"&gt;=&lt;/FONT&gt;"31JAN2020"D &amp;lt;=datepart(&lt;FONT color="#FF0000"&gt;&lt;STRONG&gt;START_to_dtm&lt;/STRONG&gt;&lt;/FONT&gt;);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;The above data step is running for 25 minutes. Is there any solution to execute the query faster.&lt;/P&gt;
&lt;P&gt;START_from_dtm has index on it. So tried by keeping only&amp;nbsp;START_from_dtm column but same time.&lt;/P&gt;
&lt;P&gt;I am using the same query multiple times and the execution is increasing.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks in advance.&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;In your original-code the variable used in the where-statement are also in the keep-list, right?&lt;/P&gt;
&lt;P&gt;Reading a dataset with 332 million obs needs some time. My before-coffee-calculation showed, that about 200000 obs are processed per second, that is not that bad at all.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 06:06:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693395#M211408</guid>
      <dc:creator>andreas_lds</dc:creator>
      <dc:date>2020-10-22T06:06:52Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693397#M211410</link>
      <description>&lt;P&gt;This might be a little more efficient:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;where START_from_dtm = "31JAN2020:00:00:00"DT &amp;lt;= START_to_dtm&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Thu, 22 Oct 2020 06:16:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693397#M211410</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2020-10-22T06:16:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693398#M211411</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13976"&gt;@SASKiwi&lt;/a&gt;&amp;nbsp;'s syntax will make all he difference in the world:&lt;/P&gt;
&lt;P&gt;- No function to slow down the filtering&lt;/P&gt;
&lt;P&gt;- &lt;STRIKE&gt;The index will be used&lt;/STRIKE&gt;&lt;/P&gt;
&lt;P&gt;Edit: No index. I confused the variable names, sorry.&lt;/P&gt;
&lt;P&gt;Still, using no function will make a significant difference.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 07:22:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693398#M211411</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-22T07:22:58Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693406#M211413</link>
      <description>&lt;P&gt;Hi Saskiwi,&lt;/P&gt;&lt;P&gt;I have used that one too. but it hasn't changed much in the performance time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 06:56:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693406#M211413</guid>
      <dc:creator>Learn_SAS1</dc:creator>
      <dc:date>2020-10-22T06:56:31Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693408#M211415</link>
      <description>&lt;P&gt;Please post the complete log from that step, so we can see the time summary; use the &amp;lt;/&amp;gt; button to post the log.&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 07:15:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693408#M211415</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-10-22T07:15:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693435#M211435</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/349066"&gt;@Learn_SAS1&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;effective_from_dtm has values like DDMONYYYY:00:00:00.000.&lt;/P&gt;
&lt;P&gt;As I have observed most of the values has&amp;nbsp; 00 as time.&lt;/P&gt;
&lt;P&gt;Index, I have mentioned in my original post, effective_from_dtm has index and no other indexes.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't create Index on that table as it is owned by other teams.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;That data set is not sorted.&lt;/P&gt;
&lt;P&gt;Thanks&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You've got a source dataset with&amp;nbsp;&lt;SPAN&gt;322,596,778 records, and need to filter on start_from_dtm &amp;lt;= some_static_date &amp;lt;= start_to_dtm.&amp;nbsp; Typical SCD2 type filtering.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Yet your source dataset doesn't have an index on that column.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Have you have explained the performance issue to your "other teams", requested an index be created on that column, and they refused?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;That's going to be your best approach to improving the performance of this query.&amp;nbsp; Google how indexes work if you need further details.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Do you have the disk space to (as a proof-of-concept) create a copy of that dataset, create your own index on start_from_dtm start_to_dtm, then test the performance results?&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If you do, you may also want to test performance if the data is stored using the SPDE engine.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 22 Oct 2020 08:47:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693435#M211435</guid>
      <dc:creator>ScottBass</dc:creator>
      <dc:date>2020-10-22T08:47:33Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693647#M211546</link>
      <description>&lt;P&gt;&amp;gt;&amp;nbsp;&lt;SPAN&gt;I have used that one too. but it hasn't changed much in the performance time.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;Well you need to help your self here. We still haven't seen the actual code (the code posted can't work), or the log or know the row length.&lt;/P&gt;
&lt;P&gt;Review the questions asked and answer them.&lt;/P&gt;</description>
      <pubDate>Fri, 23 Oct 2020 03:21:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/693647#M211546</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-10-23T03:21:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance in Data step where clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/694047#M211646</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/349066"&gt;@Learn_SAS1&lt;/a&gt;&amp;nbsp; - As already suggested by others please post your complete SAS log including the notes about run times.&lt;/P&gt;</description>
      <pubDate>Sat, 24 Oct 2020 22:19:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-in-Data-step-where-clause/m-p/694047#M211646</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2020-10-24T22:19:15Z</dc:date>
    </item>
  </channel>
</rss>

