<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: proc sql or data step in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152528#M298402</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Quite often, it's the joining process that takes time (whether in MERGE or in SQL).&amp;nbsp; Since you don't need any variables from the second table, other than to see if a match exists, it might be faster to create a format.&amp;nbsp; The limiting factor would be whether you have enough memory to load the format.&amp;nbsp; For example:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data b2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set b (keep=test_var rename=(test_var=start));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; retain label 'Found a match'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fmtname '$match';&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc format cntlin=b2;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set test_a;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if put(test_var, $match.) = 'Found a match';&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If TEST_VAR is numeric rather than character, the name of the format would be MATCH instead of $MATCH.&amp;nbsp; The resources required would not be very different compared to creating a hash table from b and then using the CHECK method to see if a match can be found.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 24 Mar 2014 17:29:52 GMT</pubDate>
    <dc:creator>Astounding</dc:creator>
    <dc:date>2014-03-24T17:29:52Z</dc:date>
    <item>
      <title>proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152521#M298395</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hi Friends i am confused on what to use and which one would run fast..., proc sql or data step&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I am doing inner join between two sas datasets on one variable which is running very long, of course because i have around 10 millions of records in each sas datasets, i am using following proc sql,&lt;/P&gt;&lt;P&gt;please let me know if there is any alternative...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;proc sql;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;create table test_a as select * from work.a inner join work.b&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;on a.test_var=b.test_var;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;quit;&lt;/STRONG&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 21 Mar 2014 17:17:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152521#M298395</guid>
      <dc:creator>woo</dc:creator>
      <dc:date>2014-03-21T17:17:40Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152522#M298396</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;what sorts/indexes are available?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 21 Mar 2014 17:34:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152522#M298396</guid>
      <dc:creator>DBailey</dc:creator>
      <dc:date>2014-03-21T17:34:54Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152523#M298397</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;nop...&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 21 Mar 2014 20:34:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152523#M298397</guid>
      <dc:creator>woo</dc:creator>
      <dc:date>2014-03-21T20:34:13Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152524#M298398</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Depends. Can you explain more? Specifically:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;1. Are the sizes of the data set different?&lt;/P&gt;&lt;P&gt;2. How many variables per dataset?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Fri, 21 Mar 2014 22:12:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152524#M298398</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-03-21T22:12:15Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152525#M298399</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Keep your data sorted (or at least indexed) and use a data step.&amp;nbsp; This will run in linear time to the size of datasets. (that is 10million obs should take twice as long as 5 million obs).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data test_a ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; merge a(in=ina) b(in=inb);&lt;/P&gt;&lt;P&gt;&amp;nbsp; by test_var ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if ina and inb ;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 22 Mar 2014 01:46:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152525#M298399</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2014-03-22T01:46:03Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152526#M298400</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Sorry Tom, but I have never come across a situation where the combination of index and BY is a good solution, it's really a performance killer...&lt;/P&gt;&lt;P&gt;As usual, we know too little of this situation to give any good advice.&lt;/P&gt;&lt;P&gt;For starters, what is the "hit-rate" in the join? If almost all records is included in the join result, there's little to do in the code. Pre-sorting the tables would improve performance, but that depends on how often the source tables are updated, and how, and how often this join (and like) will occur.&lt;/P&gt;&lt;P&gt;Moving the source table to an engine that supports multi-threading could one idea. That is SPDE or SPD Server if you want to keep the data in the SAS domain. That will give multi-threaded (parallel) disk reads and implicit BY sorting. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Sat, 22 Mar 2014 09:37:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152526#M298400</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2014-03-22T09:37:42Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152527#M298401</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I am sorry if i have not provided enough info. but here it is;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;8 same variables for both datasets &lt;/P&gt;&lt;P&gt;atleast &lt;STRONG&gt;9-10 millions&lt;/STRONG&gt; of obs for each sas datasets &lt;/P&gt;&lt;P&gt;both are sorted with key variable let's say, test_var&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;For now i am using Tom's alternative - and really working good so far...but what i have in mind that PROC SQL would pulled query faster than DATA step but it seems proven wrong in this case, i am not sure if i missing anything...i have all same variables so i did rename for one dataset except one key variable, "test_var" and then merging as Tom mentioned here, it is giving me output faster than my above proc sql step...but i am preety sure we have some SQL alternative that we can use here that i am not aware of...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Mar 2014 17:02:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152527#M298401</guid>
      <dc:creator>woo</dc:creator>
      <dc:date>2014-03-24T17:02:21Z</dc:date>
    </item>
    <item>
      <title>Re: proc sql or data step</title>
      <link>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152528#M298402</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Quite often, it's the joining process that takes time (whether in MERGE or in SQL).&amp;nbsp; Since you don't need any variables from the second table, other than to see if a match exists, it might be faster to create a format.&amp;nbsp; The limiting factor would be whether you have enough memory to load the format.&amp;nbsp; For example:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data b2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set b (keep=test_var rename=(test_var=start));&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; retain label 'Found a match'&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; fmtname '$match';&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc format cntlin=b2;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set test_a;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if put(test_var, $match.) = 'Found a match';&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If TEST_VAR is numeric rather than character, the name of the format would be MATCH instead of $MATCH.&amp;nbsp; The resources required would not be very different compared to creating a hash table from b and then using the CHECK method to see if a match can be found.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 24 Mar 2014 17:29:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/proc-sql-or-data-step/m-p/152528#M298402</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2014-03-24T17:29:52Z</dc:date>
    </item>
  </channel>
</rss>

