<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Coding Efficiency Where Clause in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286054#M58600</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large dataset and I am looking to run a query on it using a variable which includes inputted comments by an operational based team. I want to keep all records which have a specific word within Description&amp;nbsp;(for example everything with the word 'Test'). The word does not sit in&amp;nbsp;a certain postion and I have chosen to use Contains within the Where clause. There are&amp;nbsp;no other variables i can filter the data down further on.&lt;/P&gt;&lt;P&gt;This has created performance issues and the code takes a considerable amount of time to run. Is there a more efficient method then using the contains function? I have kept the minimum number of variables possible in my outputted dataset (see current code below).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SQL;&lt;BR /&gt;CREATE TABLE WORK.QUERY AS&lt;BR /&gt;SELECT t1.Identifier,&lt;BR /&gt;t1.date,&lt;BR /&gt;t1.var,&lt;BR /&gt;t1.DESCRIPTION&lt;BR /&gt;FROM data.set&amp;nbsp;t1&lt;BR /&gt;WHERE t1.DESCRIPTION CONTAINS 'Test';&lt;BR /&gt;QUIT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Thu, 21 Jul 2016 08:32:54 GMT</pubDate>
    <dc:creator>mk131190</dc:creator>
    <dc:date>2016-07-21T08:32:54Z</dc:date>
    <item>
      <title>Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286054#M58600</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a large dataset and I am looking to run a query on it using a variable which includes inputted comments by an operational based team. I want to keep all records which have a specific word within Description&amp;nbsp;(for example everything with the word 'Test'). The word does not sit in&amp;nbsp;a certain postion and I have chosen to use Contains within the Where clause. There are&amp;nbsp;no other variables i can filter the data down further on.&lt;/P&gt;&lt;P&gt;This has created performance issues and the code takes a considerable amount of time to run. Is there a more efficient method then using the contains function? I have kept the minimum number of variables possible in my outputted dataset (see current code below).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;PROC SQL;&lt;BR /&gt;CREATE TABLE WORK.QUERY AS&lt;BR /&gt;SELECT t1.Identifier,&lt;BR /&gt;t1.date,&lt;BR /&gt;t1.var,&lt;BR /&gt;t1.DESCRIPTION&lt;BR /&gt;FROM data.set&amp;nbsp;t1&lt;BR /&gt;WHERE t1.DESCRIPTION CONTAINS 'Test';&lt;BR /&gt;QUIT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2016 08:32:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286054#M58600</guid>
      <dc:creator>mk131190</dc:creator>
      <dc:date>2016-07-21T08:32:54Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286056#M58601</link>
      <description>&lt;P&gt;Try using Base SAS, SQL has certain resource issues when dealing with large data:&lt;/P&gt;
&lt;PRE&gt;data want;
  set have (where=(index(description,"Test") keep=identifier date var description);
run;&lt;/PRE&gt;
&lt;P&gt;Alternatively you don't need to create a dataset, you can create a view - this is basically storing only the code to view results:&lt;/P&gt;
&lt;PRE&gt;proc sql;
&amp;nbsp; create view WANT as
&amp;nbsp; select...;
quit;
&lt;/PRE&gt;</description>
      <pubDate>Thu, 21 Jul 2016 08:40:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286056#M58601</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-07-21T08:40:54Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286099#M58616</link>
      <description>A alternative to CONTAINS is the scan function, but I would be surprised if that would be much more efficient. &lt;BR /&gt;Unlike &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/45151"&gt;@RW9&lt;/a&gt; I don't think that a straight SQL with selct-from-where is any less efficient than a data step, but it doesn't hurt to compare. &lt;BR /&gt;Are you sure that the CONTAINS operation is the resource monster? Adding &lt;BR /&gt;Options fullstimer;&lt;BR /&gt;will hint you.</description>
      <pubDate>Thu, 21 Jul 2016 12:28:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286099#M58616</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2016-07-21T12:28:25Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286126#M58632</link>
      <description>&lt;P&gt;Just ran a test with two similar sized datasets (to avoid performance advantages caused by data residing in cache after the first read) and found that a straight SQL with contains is only marginally slower than a data step with a where= dataset option. And that may have been caused by different loads on the SAS server during the two steps. Call it "within statistical noise".&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;So, to dig further:&lt;/P&gt;
&lt;P&gt;What is the (logical and physical) size of your input dataset?&lt;/P&gt;
&lt;P&gt;How long does a simple data _null_ step take that just reads the dataset?&lt;/P&gt;
&lt;P&gt;Where does your dataset reside (locally, remote, in a DBMS)?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2016 13:45:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286126#M58632</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-07-21T13:45:16Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286152#M58648</link>
      <description>&lt;P&gt;Thanks for this.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have run other steps off of this dataset. For example a SQL step using case whens and a simple where statement restricting on specific components of a different variable&amp;nbsp;runs in around 10 minutes.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The SAS dataset is stored in a DBMS. I am not sure how to work out the size of it as it does not come up in contents and dataset procedures the number 9.0071993E15 came up when I looked for observations.&lt;/P&gt;</description>
      <pubDate>Thu, 21 Jul 2016 14:58:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286152#M58648</guid>
      <dc:creator>mk131190</dc:creator>
      <dc:date>2016-07-21T14:58:10Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286326#M58705</link>
      <description>&lt;P&gt;So the dataset is stored in a DBMS. As these often reside on other hosts, you may have to consider network bandwidth.&lt;/P&gt;
&lt;P&gt;One other thing to consider when using a DBMS is: what part of the query can be implicitly handed over to the DBMS? If you use something SAS-specific, SAS will have to read &lt;U&gt;all&lt;/U&gt; the data from the DBMS before applying conditions and functions.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 05:12:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286326#M58705</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2016-07-22T05:12:58Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286335#M58710</link>
      <description>&lt;P&gt;If the data is not in SAS the most important performance aspect is to ensure that only the necessary data is passed on to SAS.&lt;/P&gt;
&lt;P&gt;So you must ensure that the where clasue you use will be passed on the the DBMS, not run in SAS.&lt;/P&gt;
&lt;P&gt;The most universally supported&amp;nbsp;SQL operator must be:&lt;/P&gt;
&lt;P&gt;where DESCRIPTION like '%Test%'&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 06:08:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286335#M58710</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2016-07-22T06:08:26Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286355#M58712</link>
      <description>&lt;P&gt;I agree with previous replies: firs of all ensure, that the WHERE part of the query runs in the database (aka SQL pushdown, SQL pass-through).&lt;/P&gt;
&lt;P&gt;SAS is able to pussh down the query if you put it into an SQL WHERE clause, or into a data step WHERE statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There are SAS options that you can use to track, what parts of the querz was pushed down. (I don't remember the name of the otion right now.)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I suppose you (or users) will want to run different queries using different words.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Even if the query runs in database, it is still very expensive. If you have a simpleindex built on that coulumn, it is still not much help, since the query could be optimized only if you search for the beginning of the column.&lt;/P&gt;
&lt;P&gt;On the other hand there are thechnologies that support and accelerate this kind of query. Maybe your database supports it.&lt;/P&gt;
&lt;P&gt;Or you have to swich to another database.&lt;/P&gt;
&lt;P&gt;Keyword: inverted index&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jul 2016 11:05:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/286355#M58712</guid>
      <dc:creator>gergely_batho</dc:creator>
      <dc:date>2016-07-22T11:05:17Z</dc:date>
    </item>
    <item>
      <title>Re: Coding Efficiency Where Clause</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/289788#M59874</link>
      <description>&lt;P&gt;Using like instead of contains made it run in 2 minutes. Thanks for your help.&lt;/P&gt;</description>
      <pubDate>Fri, 05 Aug 2016 13:12:07 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Coding-Efficiency-Where-Clause/m-p/289788#M59874</guid>
      <dc:creator>mk131190</dc:creator>
      <dc:date>2016-08-05T13:12:07Z</dc:date>
    </item>
  </channel>
</rss>

