<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Anything wrong with this simple proc sql code? in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Anything-wrong-with-this-simple-proc-sql-code/m-p/295918#M61901</link>
    <description>&lt;PRE&gt;proc sql;
create table new_sample as
    select a.*
	    from total_sample as a,
		     selected_3000
	    where a.ID in (select ID from selected_3000 );
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Total_sample has 79 million data rows, from 79,000 unique IDs. File size 90G.&lt;/P&gt;
&lt;P&gt;selected_3000 only has 3000 rows with 3000 unique IDs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now I want to select those whose IDs are in selected_3000 from the total_sample, using the above proc sql code.&lt;/P&gt;
&lt;P&gt;However, it generated a huge file &amp;gt;200G and I had to terminate the procedure. I checked the output huge file and&lt;/P&gt;
&lt;P&gt;found the same row was repeated so many times.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What could be the problem in this&amp;nbsp; proc sql code?&lt;/P&gt;</description>
    <pubDate>Thu, 01 Sep 2016 16:07:44 GMT</pubDate>
    <dc:creator>fengyuwuzu</dc:creator>
    <dc:date>2016-09-01T16:07:44Z</dc:date>
    <item>
      <title>Anything wrong with this simple proc sql code?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Anything-wrong-with-this-simple-proc-sql-code/m-p/295918#M61901</link>
      <description>&lt;PRE&gt;proc sql;
create table new_sample as
    select a.*
	    from total_sample as a,
		     selected_3000
	    where a.ID in (select ID from selected_3000 );
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Total_sample has 79 million data rows, from 79,000 unique IDs. File size 90G.&lt;/P&gt;
&lt;P&gt;selected_3000 only has 3000 rows with 3000 unique IDs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Now I want to select those whose IDs are in selected_3000 from the total_sample, using the above proc sql code.&lt;/P&gt;
&lt;P&gt;However, it generated a huge file &amp;gt;200G and I had to terminate the procedure. I checked the output huge file and&lt;/P&gt;
&lt;P&gt;found the same row was repeated so many times.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What could be the problem in this&amp;nbsp; proc sql code?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2016 16:07:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Anything-wrong-with-this-simple-proc-sql-code/m-p/295918#M61901</guid>
      <dc:creator>fengyuwuzu</dc:creator>
      <dc:date>2016-09-01T16:07:44Z</dc:date>
    </item>
    <item>
      <title>Re: Anything wrong with this simple proc sql code?</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Anything-wrong-with-this-simple-proc-sql-code/m-p/295925#M61905</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Try this:&lt;/P&gt;&lt;PRE&gt;proc sql;
create table new_sample as
    select *
	    from total_sample
	    where a.ID in (select ID from selected_3000 );
quit;&lt;/PRE&gt;&lt;P&gt;You don't need to have selected_3000 in the main query and also in the subquery. Since you had it in the main query, it was creating a cartesian product. It was returning 3000 times as many rows as you needed.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2016 16:23:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Anything-wrong-with-this-simple-proc-sql-code/m-p/295925#M61905</guid>
      <dc:creator>set_all__</dc:creator>
      <dc:date>2016-09-01T16:23:50Z</dc:date>
    </item>
  </channel>
</rss>

