<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to handle large data in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188564#M35635</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Try to sort your existing dataset first.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort&lt;/P&gt;&lt;P&gt;&amp;nbsp; data=rmdeoap.gets_dw_eoa_flt_dccca_v (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; where=(RNH = 'CSX' and 'occur date'n ge '01Nov2014'd)&lt;/P&gt;&lt;P&gt;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&amp;nbsp; out=tempdata&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;by RNH, RN, date, hour;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If that succeeds, the data step (or proc means) to build the average is a breeze.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that the where condition on the input data set reduces the size of the utility file built while sorting, so you may escape your disk full problem. proc sql has the nasty habit of throwing all the data into one big utility file and doing all its stuff there, causing a tankerload of competing disk accesses that tend to slow down systems to a crawl and often cause unnecessary disk space problems.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort can be made faster and more stable by using UTILLOC= to assign a separate physical disk for the utility files.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 27 Nov 2014 06:25:32 GMT</pubDate>
    <dc:creator>Kurt_Bremser</dc:creator>
    <dc:date>2014-11-27T06:25:32Z</dc:date>
    <item>
      <title>How to handle large data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188563#M35634</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello I am working with a very large data set somewhere in the 50 million + records.&amp;nbsp; The code that i have attached works ok when i subset out a month but will not work on a much larger chunk of the data. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What is the best way for me to be able to do this on the entire dataset consisting of 5 years?&amp;nbsp; &lt;/P&gt;&lt;P&gt;Is sorting and then using data step my best option?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thank you so much for all your expertise.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; create table ambient_temp_dc as&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; select RNH, RN, datepart('occur date'n) as date, hour('occur date'n) as hour;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; avg(basic_AT) as avg_ambient&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from rmdeoap.gets_dw_eoa_flt_dccca_v&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; where RNH = 'CSX' and 'occur date'n ge '01Nov2014'd&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; group by RNH, RN, date, hour&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; ;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 27 Nov 2014 05:24:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188563#M35634</guid>
      <dc:creator>dsbihill</dc:creator>
      <dc:date>2014-11-27T05:24:44Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle large data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188564#M35635</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Try to sort your existing dataset first.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort&lt;/P&gt;&lt;P&gt;&amp;nbsp; data=rmdeoap.gets_dw_eoa_flt_dccca_v (&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; where=(RNH = 'CSX' and 'occur date'n ge '01Nov2014'd)&lt;/P&gt;&lt;P&gt;&amp;nbsp; )&lt;/P&gt;&lt;P&gt;&amp;nbsp; out=tempdata&lt;/P&gt;&lt;P&gt;;&lt;/P&gt;&lt;P&gt;by RNH, RN, date, hour;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If that succeeds, the data step (or proc means) to build the average is a breeze.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that the where condition on the input data set reduces the size of the utility file built while sorting, so you may escape your disk full problem. proc sql has the nasty habit of throwing all the data into one big utility file and doing all its stuff there, causing a tankerload of competing disk accesses that tend to slow down systems to a crawl and often cause unnecessary disk space problems.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sort can be made faster and more stable by using UTILLOC= to assign a separate physical disk for the utility files.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 27 Nov 2014 06:25:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188564#M35635</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2014-11-27T06:25:32Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle large data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188565#M35636</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;50 million records isn't that big so there are various ways to handle this. I'd start off by trying a standard proc means even without the sort, though you may need it. &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You can use formats, datetime12. to get the data by hour rather than calculate the hour/date and then separate them out afterwards if really desired. &lt;/P&gt;&lt;P&gt;If none of the above solutions work, you can always process the data a month at a time.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New';"&gt;&lt;SPAN style="color: #011993;"&gt;&lt;STRONG&gt;proc&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN style="color: #011993;"&gt;&lt;STRONG&gt;means&lt;/STRONG&gt;&lt;/SPAN&gt; &lt;SPAN style="color: #0433ff;"&gt;data&lt;/SPAN&gt;=rmdeoap.gets_dw_eoa_flt_dccca_v &lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New';"&gt;&amp;nbsp; (where = (RNH=&lt;SPAN style="color: #942193;"&gt;'CSX'&lt;/SPAN&gt;) keep=RNH RN &lt;SPAN style="color: #942193;"&gt;'occur date'n&lt;/SPAN&gt; basic_AT) &lt;SPAN style="color: #0433ff;"&gt;noprint&lt;/SPAN&gt;;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #942193;"&gt;&lt;SPAN style="color: #000000;"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0433ff;"&gt;class&lt;/SPAN&gt;&lt;SPAN style="color: #000000;"&gt; RNH RN &lt;/SPAN&gt;'occur_date'n&lt;SPAN style="color: #000000;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #942193;"&gt;&lt;SPAN style="color: #000000;"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0433ff;"&gt;ways&lt;/SPAN&gt;&lt;SPAN style="color: #000000;"&gt; RNH*RN*&lt;/SPAN&gt;'occur_date'n&lt;SPAN style="color: #000000;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #942193;"&gt;&lt;SPAN style="color: #000000;"&gt;&amp;nbsp; &lt;/SPAN&gt;&lt;SPAN style="color: #0433ff;"&gt;format&lt;/SPAN&gt;&lt;SPAN style="color: #000000;"&gt; &lt;/SPAN&gt;'occur_date'n&lt;SPAN style="color: #000000;"&gt; &lt;/SPAN&gt;&lt;SPAN style="color: #009193;"&gt;datetime12.&lt;/SPAN&gt;&lt;SPAN style="color: #000000;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New';"&gt;&amp;nbsp; &lt;SPAN style="color: #0433ff;"&gt;output&lt;/SPAN&gt; &lt;SPAN style="color: #0433ff;"&gt;out&lt;/SPAN&gt;=ambient_temp_dc &lt;SPAN style="color: #0433ff;"&gt;mean&lt;/SPAN&gt;(basic_at)=avg_ambient;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #011993;"&gt;&lt;STRONG&gt;run&lt;/STRONG&gt;&lt;SPAN style="color: #000000;"&gt;;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #011993;"&gt;&lt;SPAN style="color: #000000;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P style="font-size: 10px; font-family: 'Courier New'; color: #011993;"&gt;&lt;SPAN style="color: #000000;"&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 27 Nov 2014 07:32:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188565#M35636</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2014-11-27T07:32:10Z</dc:date>
    </item>
    <item>
      <title>Re: How to handle large data</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188566#M35637</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I prefer to Data step + Array .&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 27 Nov 2014 13:11:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-handle-large-data/m-p/188566#M35637</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2014-11-27T13:11:10Z</dc:date>
    </item>
  </channel>
</rss>

