<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Proc SQL vs Data step advice in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403181#M97962</link>
    <description>&lt;P&gt;I have a table called &lt;EM&gt;all_data&lt;/EM&gt;&amp;nbsp;read into SAS that consists of 166 variables, of which column M1 is the primary key column, and consists of 50 million records - so as you can imagine the table is quite large, takes up about 55GB hard drive space.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a macro which selects the primary key M1 column, and one of the other variables (so two columns in total) and reads it into a temporary table which the macro uses to compute. This macro will run for all the variables in the table so I am looking to save time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At the moment, I am using a Data step to single out my two columns at the start of the macro. The variable that differs each time is read in as the parameter &lt;EM&gt;var &lt;/EM&gt;declared in the macro:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data table_temp;
	set all_data (keep=M1 &amp;amp;var.);
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I currently have this running for the first variable, but this data step alone takes about 40 minutes, which is not ideal because there is a lot of processing after that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was wondering if using a Proc SQL statement instead of a Data step would speed up this process? Something like below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
          create table table_temp as
                 select M1, &amp;amp;var.
                        from all_data;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Or possibly using Proc SQL and selecting all and using a KEEP in the FROM statement?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If anyone could offer advice on the speediest way to do this, with an explanation, it would be very much appreciated - thankyou!&lt;/P&gt;</description>
    <pubDate>Wed, 11 Oct 2017 14:22:39 GMT</pubDate>
    <dc:creator>C20325A</dc:creator>
    <dc:date>2017-10-11T14:22:39Z</dc:date>
    <item>
      <title>Proc SQL vs Data step advice</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403181#M97962</link>
      <description>&lt;P&gt;I have a table called &lt;EM&gt;all_data&lt;/EM&gt;&amp;nbsp;read into SAS that consists of 166 variables, of which column M1 is the primary key column, and consists of 50 million records - so as you can imagine the table is quite large, takes up about 55GB hard drive space.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a macro which selects the primary key M1 column, and one of the other variables (so two columns in total) and reads it into a temporary table which the macro uses to compute. This macro will run for all the variables in the table so I am looking to save time.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;At the moment, I am using a Data step to single out my two columns at the start of the macro. The variable that differs each time is read in as the parameter &lt;EM&gt;var &lt;/EM&gt;declared in the macro:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data table_temp;
	set all_data (keep=M1 &amp;amp;var.);
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;I currently have this running for the first variable, but this data step alone takes about 40 minutes, which is not ideal because there is a lot of processing after that.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was wondering if using a Proc SQL statement instead of a Data step would speed up this process? Something like below:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;PROC SQL;
          create table table_temp as
                 select M1, &amp;amp;var.
                        from all_data;
quit;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Or possibly using Proc SQL and selecting all and using a KEEP in the FROM statement?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If anyone could offer advice on the speediest way to do this, with an explanation, it would be very much appreciated - thankyou!&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 14:22:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403181#M97962</guid>
      <dc:creator>C20325A</dc:creator>
      <dc:date>2017-10-11T14:22:39Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL vs Data step advice</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403184#M97963</link>
      <description>&lt;P&gt;Which is faster? I'm going to&amp;nbsp;&lt;EM&gt;guess&lt;/EM&gt; that the data step is faster ... but ... you could try them both and then you'd know, you wouldn't have to rely on someone's guess.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But since we don't know what you are doing as far as the next step after selecting the proper columns from all_data — I have a suggestion that might work (again depending on what you are doing). The suggestion is to NOT do this one variable at a time, but to do the next step on ALL variables; for example if you are computing some summary statistics via PROC MEANS or PROC SUMMARY, do this on ALL variables, for example:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;proc summary data=all_data;
    class m1;
    var firstvarname--lastvarname; /* or some legal list of variable names */
    output out=stats mean= stddev=/autoname;
run;&lt;/PRE&gt;
&lt;P&gt;I would certainly pursue this method and take it as far as possible, rather than a macro with loops.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 14:30:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403184#M97963</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2017-10-11T14:30:25Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL vs Data step advice</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403206#M97971</link>
      <description>&lt;P&gt;I'll echo the advice: do _not_ do this the way you're suggesting.&amp;nbsp; 150 or so reads&amp;nbsp;from this dataset, and then writes of ~1GB data files, and then (further processing), will be slow, slow, slow.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;What else can you do?&amp;nbsp; Well, you could turn this into a vertical dataset, meaning ID + variable name/key + variable value.&amp;nbsp; This will be a net-larger dataset (maybe 3 times as large, at most).&amp;nbsp; But you can now use BY group processing to perform your analysis, and the increase in size may not be significant.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can also use views to process your work, which at least avoids the step of writing out the 150 1GB datasets.&amp;nbsp; You still do 150 reads of the dataset, which is still slow, but perhaps it's not that bad - depending on what you're&amp;nbsp;doing after this it's possible you're not going to have that much of an issue here.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Ultimately any&amp;nbsp;better advice needs more details as to what you're doing next.&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 15:24:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403206#M97971</guid>
      <dc:creator>snoopy369</dc:creator>
      <dc:date>2017-10-11T15:24:51Z</dc:date>
    </item>
    <item>
      <title>Re: Proc SQL vs Data step advice</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403276#M97990</link>
      <description>&lt;P&gt;Change your process - whatever it is to process all your columns at once rather than two at a time. Otherwise you're reading your data too many times.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Without more details that's the best we can offer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You can review the Don't be Loopy paper by David Cassell for more details on why you shouldn't use macros.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Oct 2017 17:40:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Proc-SQL-vs-Data-step-advice/m-p/403276#M97990</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2017-10-11T17:40:14Z</dc:date>
    </item>
  </channel>
</rss>

