<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Reading in a large dataset using Proc SQL in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627512#M185271</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm looking for a way to filter a large dataset that is being read in, using proc sql. I'd ideally like to only read in appointment numbers that match appointment numbers in another dataset. As it stands now, the dataset takes over an hour to read in, so I'm trying to speed it up. In the past I've used something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc SQL;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Create Table New as&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Select *&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; From All_Scores as A&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;,DI_SCR (where=(DI=&amp;amp;DIid.)) as B&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;,OrgCodes as C&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Where A.APPT_ID=B.APPT_ID&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND B.DI_ORG_ID=C.ORG_ID;&lt;/P&gt;&lt;P&gt;Quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The "where=" in the from statement significantly cut down the processing time. Is there a way to do something similar, that only reads in the APPT_IDs from B, if the APPT_ID is also in A?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 26 Feb 2020 14:54:36 GMT</pubDate>
    <dc:creator>NR13</dc:creator>
    <dc:date>2020-02-26T14:54:36Z</dc:date>
    <item>
      <title>Reading in a large dataset using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627512#M185271</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm looking for a way to filter a large dataset that is being read in, using proc sql. I'd ideally like to only read in appointment numbers that match appointment numbers in another dataset. As it stands now, the dataset takes over an hour to read in, so I'm trying to speed it up. In the past I've used something like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Proc SQL;&lt;BR /&gt;&amp;nbsp; &amp;nbsp; Create Table New as&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Select *&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; From All_Scores as A&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;,DI_SCR (where=(DI=&amp;amp;DIid.)) as B&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;,OrgCodes as C&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; Where A.APPT_ID=B.APPT_ID&lt;BR /&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; AND B.DI_ORG_ID=C.ORG_ID;&lt;/P&gt;&lt;P&gt;Quit;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The "where=" in the from statement significantly cut down the processing time. Is there a way to do something similar, that only reads in the APPT_IDs from B, if the APPT_ID is also in A?&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2020 14:54:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627512#M185271</guid>
      <dc:creator>NR13</dc:creator>
      <dc:date>2020-02-26T14:54:36Z</dc:date>
    </item>
    <item>
      <title>Re: Reading in a large dataset using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627529#M185277</link>
      <description>&lt;P&gt;How many obs are in DI_SCR and OrgCodes?&lt;/P&gt;
&lt;P&gt;The optimal solution might be a data step with hash objects used for the lookup.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2020 15:05:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627529#M185277</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-02-26T15:05:19Z</dc:date>
    </item>
    <item>
      <title>Re: Reading in a large dataset using Proc SQL</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627648#M185324</link>
      <description>&lt;P&gt;Are these SAS data sets?&lt;/P&gt;
&lt;P&gt;Are the tables sorted? Or indexed? Can they easily be kept sorted and/or indexed, or are they refreshed too often?&lt;/P&gt;
&lt;P&gt;How many records in each?&lt;/P&gt;
&lt;P&gt;You use&lt;FONT face="courier new,courier"&gt; select *&lt;/FONT&gt;. Do you actually need all columns from all tables?&lt;/P&gt;
&lt;P&gt;Please add the &lt;FONT face="courier new,courier"&gt;_method&lt;/FONT&gt; option after &lt;FONT face="courier new,courier"&gt;proc sql&lt;/FONT&gt; so we can see how SAS executes the joins.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2020 20:16:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Reading-in-a-large-dataset-using-Proc-SQL/m-p/627648#M185324</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-02-26T20:16:32Z</dc:date>
    </item>
  </channel>
</rss>

