<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to reduce system processing time, Index on Specific Variables (on huge data) in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149388#M29521</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;One more thought ... you mention that you already processed the data once, so it looks like you want to reduce the processing time for future runs.&amp;nbsp; Why not add a field to the original data base, a 0/1 flag to indicate commercial?&amp;nbsp; Even if that takes a while to run, all your subsequent runs can easily pull the proper records quickly.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Tue, 05 Aug 2014 15:20:06 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2014-08-05T15:20:06Z</dc:date>
    <item>
      <title>How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149380#M29513</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Dear All,&lt;/P&gt;&lt;P&gt;I have millions of recorde in a Dataset, need to get selected Names from Name field using contains or Like function in Proc sql;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Ex:-&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Name&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;XXXXX ACADEMY&lt;/P&gt;&lt;P&gt;YYYY AGENCES YYYY&lt;/P&gt;&lt;P&gt;ZZZZ ZZZZZ ZZZZ&lt;/P&gt;&lt;P&gt;AAA COMPANY&lt;/P&gt;&lt;P&gt;BBBB BBBB&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;Out put should be like&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Name&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;XXXXX ACADEMY&lt;/P&gt;&lt;P&gt;YYYY AGENCES YYYY&lt;/P&gt;&lt;P&gt;AAA COMPANY&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;&lt;STRONG&gt;For this Quirey am using&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;create table want as select Name&lt;/P&gt;&lt;P&gt;from have&lt;/P&gt;&lt;P&gt;where Name contains "AGENCES" or&lt;/P&gt;&lt;P&gt;Name contains "COMPANY" or&lt;/P&gt;&lt;P&gt;Name contains "ACADEMY" or&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Name contains " AGENCES " or&lt;/P&gt;&lt;P&gt;Name contains " COMPANY " or&lt;/P&gt;&lt;P&gt;Name contains " ACADEMY " or&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;Name contains " AGENCES" or&lt;/P&gt;&lt;P&gt;Name contains " COMPANY" or&lt;/P&gt;&lt;P&gt;Name contains " ACADEMY"&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I Have orizinal Name Field Observation Count is 1,00,00,000&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selected Name(Contains) count is 5,000&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Instead of Conains I can use "Like" function, but it is showing in log window as Macro name is not resolved for some specific Names:&lt;/P&gt;&lt;P&gt;Ex: - "%X", "%SYS"&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help me out ,&lt;/P&gt;&lt;P&gt;how can i create Index on Name field and reduce the system processing time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in Advance.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 05:50:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149380#M29513</guid>
      <dc:creator>sas_lak</dc:creator>
      <dc:date>2014-08-05T05:50:43Z</dc:date>
    </item>
    <item>
      <title>Re: Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149381#M29514</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;forget SQL go for NOSQL.... ; &amp;lt;)&amp;nbsp; &lt;/P&gt;&lt;P&gt;The reasons:&lt;BR /&gt;- Indexing will only work for fixed values not for fuzzy ones.&lt;/P&gt;&lt;P&gt;- You want to process all records not just a subset. The SQL (OLTP) is designed for simple subsets.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How to proceed:&lt;/P&gt;&lt;P&gt;- If your source is a RDBMS check whether it can be solve with some native sql dialect. Using SQL pass-through&lt;/P&gt;&lt;P&gt;&amp;nbsp; When not all, data must be copied to SAS.&amp;nbsp; (time/resources needed)&lt;/P&gt;&lt;P&gt;- Using SAS reducing processing speed is checking the system is tuned for your process (bufsize memsize and more)&lt;/P&gt;&lt;P&gt;- Go for sequential processing when more evaluation logic is needed (contains, like, perl expressions) it is data-cleansing.&lt;/P&gt;&lt;P&gt;Using sequential IO is much faster than random hitting more than 20% of the data (to evaluate) sequential will win. The datastep processes sequential, proc sql random. Having more than 1 processor you can think on splitting a big datastep accordingly and merge the results. The Same concept as hadoop gmapreduce.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 07:42:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149381#M29514</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-08-05T07:42:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149382#M29515</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Make a dataset to hold these index value ,and then contains operator .&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data ind;&lt;/P&gt;&lt;P&gt;input ind : $40.;&lt;/P&gt;&lt;P&gt;cards;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;COMPANY&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;ACADEMY&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt;AGENCES &lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;COMPANY&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;run;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;proc sql;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;select * from have,ind&amp;nbsp; where &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;ACADEMY contains strip(ind) ;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;quit;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;&lt;/P&gt;&lt;P style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;Xia Keshan&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 12:25:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149382#M29515</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2014-08-05T12:25:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149383#M29516</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Xia please also give an explanation on what will happen when running this.&lt;BR /&gt;Always fun to have several approaches to choose from and than evaluate them readability / generic applicable / executing performance &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 12:45:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149383#M29516</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-08-05T12:45:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149384#M29517</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;At a bare minimum, you can remove two-thirds of your conditions.&amp;nbsp; The first set (no leading or trailing blanks) will identify all the observations you are looking for.&amp;nbsp; The other two sets of CONTAINS are just not needed.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 13:41:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149384#M29517</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2014-08-05T13:41:13Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149385#M29518</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I believe, and correct me If I am wrong, he is doing a merge based on of IND to have based on the text being found, hence there would be an additional column in the output.&amp;nbsp; The merge is only where both exist.&amp;nbsp; Nice.&amp;nbsp; I have run a couple of test though and a straight forward where clause does actually seem to run a fair bit faster than the merge approach:&lt;/P&gt;&lt;P&gt;data have;&lt;BR /&gt;&amp;nbsp; do aval=1 to 1000000;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mytext="XXXXX ACADEMY"; output;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mytext="YYYY AGENCES YYYY"; output;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mytext="ZZZZ ZZZZZ ZZZZ"; output;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mytext="AAA COMPANY"; output;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mytext="BBBB BBBB"; output;&lt;BR /&gt;&amp;nbsp; end;&lt;BR /&gt;run;&lt;BR /&gt;data ind;&lt;BR /&gt;input ind : $40.;&lt;BR /&gt;cards;&lt;BR /&gt;COMPANY&lt;BR /&gt;ACADEMY&lt;BR /&gt;AGENCES &lt;BR /&gt;COMPANY&lt;BR /&gt;;&lt;BR /&gt;run;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table WANT as&lt;BR /&gt;select mytext from have&amp;nbsp; where mytext like '%COMPANY%' or mytext like '%ACADEMY%' or mytext like '%AGENCES%' or mytext like '%COMPANY%';&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;proc sql;&lt;BR /&gt;create table WANT as&lt;BR /&gt;select mytext,ind from have,ind&amp;nbsp; where mytext contains strip(ind);&lt;BR /&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you run these a few times I was seeing a difference of maybe 3 seconds real time and 1.9 user cpu time in favour of the where.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note, to the OP - You can't use like "%" as any % within double quotes is treated as a macro variable.&amp;nbsp; For the like to work you need to use single quotes. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 14:22:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149385#M29518</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-08-05T14:22:53Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149386#M29519</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi All,&lt;/P&gt;&lt;P&gt;Thank you very much for response,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1) I have 1,00,00,000 observations on NAME&lt;/P&gt;&lt;P&gt;2) I need to find out How many names having commercial names (commercial Names are not more than 5000)&lt;/P&gt;&lt;P&gt;Ex:- Name&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;XXXXX ACADEMY&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;XXXXXACADEMYXXXXX&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;ACADEMYXXXXXXX&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ----------- &amp;gt; &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;ACADEMY &lt;/SPAN&gt; with no Spaces&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="background-color: #ffffff; font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;"&gt;XXXXX ACADEMY XXXXXX&amp;nbsp;&amp;nbsp; ---------- &amp;gt;&amp;nbsp; &lt;SPAN style="font-family: 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; background-color: #ffffff;"&gt;ACADEMY &lt;/SPAN&gt;with leading and trail blanks &lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;3) we can use like function with "%" (EX:- "%ACADAMY%" or "%ACADAMY") but after running the program, error is showing in log "MAXIMUM MACRO PARAMETERS ARE USED and some MACRO names are not resolved",&lt;/P&gt;&lt;P&gt;4) we can use only CONTAINS function with and with out leading and trail blanks for commercial names.&lt;/P&gt;&lt;P&gt;5) but it is taking 2 to 3 hours for output without INDEX on Name field. (minimum 2 hours with INDEX on name)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can I reduce the processing time. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks you,&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 14:56:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149386#M29519</guid>
      <dc:creator>sas_lak</dc:creator>
      <dc:date>2014-08-05T14:56:54Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149387#M29520</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Check out my post, you cannot use "%" syntax.&amp;nbsp; SAS is treating the "%ACADEMY%" as a macro variable, so it is trying to find a macro variable name ACADEMY in the macro table and not finding it.&amp;nbsp; TO use a percentage sign like that, you need to put it within single quotes: '%ACADEMY%'.&lt;/P&gt;&lt;P&gt;However saying that, if you have 5000 commercial names then the like syntax is not going to work as there are limitations on how many you can put in.&amp;nbsp; Try KSharp's code.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 15:09:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149387#M29520</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2014-08-05T15:09:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149388#M29521</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;One more thought ... you mention that you already processed the data once, so it looks like you want to reduce the processing time for future runs.&amp;nbsp; Why not add a field to the original data base, a 0/1 flag to indicate commercial?&amp;nbsp; Even if that takes a while to run, all your subsequent runs can easily pull the proper records quickly.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 15:20:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149388#M29521</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2014-08-05T15:20:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to reduce system processing time, Index on Specific Variables (on huge data)</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149389#M29522</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;As it analyzing all data for a lot of strings in the name variable. Use the datastep. &lt;BR /&gt;Maybe PRX-string matching is one option for complicated string testing.&amp;nbsp; Assuming your have 5000 strings that would indicate a commercial name.&lt;/P&gt;&lt;P&gt;You will probably put some people incorrect into that.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This is the description what it does: &lt;A href="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n13as9vjfj7aokn1syvfyrpaj7z5.htm" title="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n13as9vjfj7aokn1syvfyrpaj7z5.htm"&gt;SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition&lt;/A&gt;&lt;/P&gt;&lt;P&gt;this is the prxmatch function call: &lt;A href="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0bj9p4401w3n9n1gmv6tfshit9m.htm" title="http://support.sas.com/documentation/cdl/en/lefunctionsref/67239/HTML/default/viewer.htm#n0bj9p4401w3n9n1gmv6tfshit9m.htm"&gt;SAS(R) 9.4 Functions and CALL Routines: Reference, Second Edition&lt;/A&gt;&amp;nbsp; there are some examples.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;I you want us to try some code for you, you have to come up with some more detailed data examples and wanted results. &lt;BR /&gt;We are just seeing now one name field with several options for the string and one string detection (academy) upcase/lowcase?&lt;BR /&gt; &lt;/P&gt;&lt;P&gt;I did a test with Rw9 example. See the log.&lt;/P&gt;&lt;P&gt;The patteren is just a copy of 3 possible values, the next choice is some context driven decision.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;- It could be enhanced with wildcards or others using PRX options. &lt;BR /&gt;- It could be enhanced by using a macro generating that string. &lt;/P&gt;&lt;P&gt;&amp;nbsp; (combined)&amp;nbsp; Do not know limitiations the prxmatch string. The proceeding document is more helpful as SAS doc.&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P&gt;Speed processing this one 50M records is no issue. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;43&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; data hv_commerce(keep=name)&lt;/P&gt;&lt;P class="sasSource"&gt; 44&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; hv_persons(keep=name) ;&lt;/P&gt;&lt;P class="sasSource"&gt; 45&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have;&lt;/P&gt;&lt;P class="sasSource"&gt; 46&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 47&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if _N_=1 then do;&lt;/P&gt;&lt;P class="sasSource"&gt; 48&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; retain PerlExpression;&lt;/P&gt;&lt;P class="sasSource"&gt; 49&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; pattern="/academy|agences|company/i";&lt;/P&gt;&lt;P class="sasSource"&gt; 50&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; PerlExpression=prxparse(pattern);&lt;/P&gt;&lt;P class="sasSource"&gt; 51&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P class="sasSource"&gt; 52&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 53&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if prxmatch(PerlExpression, name) then output hv_commerce ;&lt;/P&gt;&lt;P class="sasSource"&gt; 54&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else output hv_persons ;&lt;/P&gt;&lt;P class="sasSource"&gt; 55&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/P&gt;&lt;P class="sasSource"&gt; 56&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; run;&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote1_1407254314633"&gt; NOTE: There were 50000000 observations read from the data set WORK.HAVE.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote2_1407254314633"&gt; NOTE: The data set WORK.HV_COMMERCE has 30000000 observations and 1 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote3_1407254314633"&gt; NOTE: The data set WORK.HV_PERSONS has 20000000 observations and 1 variables.&lt;/P&gt;&lt;P class="sasNote" id="sasLogNote4_1407254314633"&gt; NOTE: DATA statement used (Total process time):&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; real time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1:23.74&lt;/P&gt;&lt;P class="sasNote"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; cpu time&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; 1:19.19&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Tue, 05 Aug 2014 15:21:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-reduce-system-processing-time-Index-on-Specific-Variables/m-p/149389#M29522</guid>
      <dc:creator>jakarman</dc:creator>
      <dc:date>2014-08-05T15:21:39Z</dc:date>
    </item>
  </channel>
</rss>

