<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lag performance over 400 million sorted records in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507598#M136273</link>
    <description>&lt;P&gt;Thanks to Everyone who replied.&amp;nbsp; As usual, I'm immensely grateful for the intelligent responses I received.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My largest concern was that for each record we were creating 15 lagged variables whether or not they were needed for the calculations--the number of needed lagged variables is based on the loan_count--if loan_count=2 then we need 2 lagged variables and only if it is more than 6 do we need all 15.&amp;nbsp; As a short-term fix until I can further investigate the temporary array approach I'm using this code that lags only the fields needed for the calculation.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;if loan_count = 2 then do;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp; ** here create only the lagged variables we need. ;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;dpd_lag1 =&amp;nbsp; lag(dpd);&amp;nbsp; pay_lag1 =&amp;nbsp; lag(payment_amount);&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd = max(dpd,dpd_lag1);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; max_dpd_2 = dpd_lag1;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CHRVA421 = (100 * payment) / dq_monthly_payment;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CHRVA421P = (100 * sum(payment_amount,pay_lag1)) / (dq_monthly_payment * 2);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;else if loan_count = 4 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dp_lag1 =&amp;nbsp; lag(dpd); &amp;nbsp; &amp;nbsp; pay_lag1 =&amp;nbsp; lag(payment_amount);&amp;nbsp; prin_lag1 =&amp;nbsp; lag(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag2 = lag2(dpd);&amp;nbsp; pay_lag2 = lag2(payment_amount);&amp;nbsp; prin_lag2 = lag2(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag3 = lag3(dpd);&amp;nbsp; pay_lag3 = lag3(payment_amount); &amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd = max(dpd,dpd_lag1,dpd_lag2,dpd_lag3);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; max_dpd_2 = max(dpd_lag1,dpd_lag2,dpd_lag3);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CHRVA421 = (100 * sum(payment,prin_lag1,prin_lag2)) / (dq_monthly_payment * 3);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;CHRVA421P = (100 * sum(payment_amount,pay_lag1,pay_lag2,pay_lag3))/(dq_monthly_payment * 4);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;end;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;else if loan_count &amp;gt; 6 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag1 =&amp;nbsp; lag(dpd); &amp;nbsp; pay_lag1 =&amp;nbsp; lag(payment_amount);&amp;nbsp; prin_lag1 =&amp;nbsp; lag(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag2 = lag2(dpd);&amp;nbsp; pay_lag2 = lag2(payment_amount);&amp;nbsp; prin_lag2 = lag2(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag3 = lag3(dpd);&amp;nbsp; pay_lag3 = lag3(payment_amount);&amp;nbsp; prin_lag3 = lag3(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag4 = lag4(dpd);&amp;nbsp; pay_lag4 = lag4(payment_amount);&amp;nbsp; prin_lag4 = lag4(payment);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;dpd_lag5 = lag5(dpd);&amp;nbsp; pay_lag5 = lag5(payment_amount);&amp;nbsp; prin_lag5 = lag5(payment);&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp; max_dpd = max(dpd,dpd_lag1,dpd_lag2,dpd_lag3,dpd_lag4,dpd_lag5);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp; max_dpd_2 = max(dpd_lag1,dpd_lag2,dpd_lag3,dpd_lag4,dpd_lag5);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CHRVA421 = (100 * sum(payment,prin_lag1,prin_lag2,prin_lag3,prin_lag4,prin_lag5))/(dq_monthly_payment*6);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; CHRVA421P = (100 * sum(payment_amount,pay_lag1,pay_lag2,pay_lag3,pay_lag4,pay_lag5))/(dq_monthly_payment*6);&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT size="2"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="2"&gt;&amp;nbsp;end;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="3"&gt;&amp;nbsp;&lt;/FONT&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 25 Oct 2018 21:31:26 GMT</pubDate>
    <dc:creator>bentleyj1</dc:creator>
    <dc:date>2018-10-25T21:31:26Z</dc:date>
    <item>
      <title>Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507021#M135996</link>
      <description>&lt;P&gt;I'm reworking a program that has a data set with about 400 million records sorted by account number with one account number per time period.&amp;nbsp; There are about 300 time periods but each account can be in the data set only maybe&amp;nbsp;20 times.&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;We have to process by account_number because we need to derive running totals for each account number across all time periods.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;Some derived variables requires lagging.&amp;nbsp; Here's the existing code-.&amp;nbsp; For each record a series of lagged variables are created and then referenced in the code later.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;dpd_lag1=lag(dpd);&amp;nbsp; pay_lag1=lag(payment_amount);&amp;nbsp; prin_lag1=lag(payment);&lt;BR /&gt;&amp;nbsp;dpd_lag2=lag2(dpd);&amp;nbsp; pay_lag2=lag2(payment_amount);&amp;nbsp; prin_lag2=lag2(payment);&lt;BR /&gt;&amp;nbsp;dpd_lag3=lag3(dpd);&amp;nbsp; pay_lag3=lag3(payment_amount);&amp;nbsp; prin_lag3=lag3(payment);&lt;BR /&gt;&amp;nbsp;dpd_lag4=lag4(dpd);&amp;nbsp; pay_lag4=lag4(payment_amount);&amp;nbsp; prin_lag4=lag4(payment);&lt;BR /&gt;&amp;nbsp;dpd_lag5=lag5(dpd);&amp;nbsp; pay_lag5=lag5(payment_amount);&amp;nbsp; prin_lag5=lag5(payment);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;if loan_count=2 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd=max(dpd,dpd_lag1);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd_2=dpd_lag1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=(100*payment)/dq_monthly_payment;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=(100*sum(payment_amount,pay_lag1))/(dq_monthly_payment*2);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; else if loan_count=3 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd=max(dpd,dpd_lag1,dpd_lag2);&lt;BR /&gt;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd_2=max(dpd_lag1,dpd_lag2);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;if dq_monthly_payment &amp;gt; 0 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=(100*sum(payment,prin_lag1))/(dq_monthly_payment*2);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=(100*sum(payment_amount,pay_lag1,pay_lag2))/(dq_monthly_payment*3);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; ** continues up to loan_count=6 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The program takes Hours to run on our server.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My question is, do you think it would be faster to lag the variables 'on the fly' instead of creating them as we're doing here?&amp;nbsp; We'd remove the lag assignment statements and the code would look like this with LAG functions used on-the-fly. (may be parenthesis issues in here.)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;if loan_count=2 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd=max(dpd,lag(dpd));&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;max_dpd_2=lag(dpd);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;chrva421=(100*payment)/dq_monthly_payment;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=(100*sum(payment_amount,lag(payment_amount))/(dq_monthly_payment*2);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; else do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp; else if loan_count=3 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd=max(dpd,lag(dpd),lag2(dpd));&lt;BR /&gt;&amp;nbsp; &amp;nbsp;max_dpd_2=max(dpd_lag1,dpd_lag2);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if dq_monthly_payment &amp;gt; 0 then do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;chrva421=(100*sum(payment,lag(payment))/(dq_monthly_payment*2);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;chrva421p=(100*sum(payment_amount,lag(payment),lag2(payment))/(dq_monthly_payment*3);&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; else do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp; end;&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;I'm also thinking of putting this into a macro do-loop.&lt;BR /&gt;&amp;nbsp;&lt;BR /&gt;Thanks in advance for your thoughts.&lt;BR /&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;John&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 21:37:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507021#M135996</guid>
      <dc:creator>bentleyj1</dc:creator>
      <dc:date>2018-10-23T21:37:20Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507022#M135997</link>
      <description>If your data is sorted, what about a temporary array instead? And if you know it's only twenty maximum per ID that's the limit of the size you'd need and SAS will drop them automatically.</description>
      <pubDate>Tue, 23 Oct 2018 21:39:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507022#M135997</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-23T21:39:21Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507026#M136000</link>
      <description>&lt;P&gt;I've thought of using an array but figured i'd ask for opinions on this solution first.&amp;nbsp; The issue with a _temporary_ array is that the values are retained-- "Temporary data element values are always automatically retained, rather than being reset to missing at the beginning of the next iteration of the DATA step."&amp;nbsp; &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000201956.htm" target="_blank"&gt;http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000201956.htm&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need the array to be refreshed/reloaded with each iteration of the DATA step because an account can have more than 5 records.&amp;nbsp; So&amp;nbsp;would anything be gained from an array versus assignment statements?&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 21:51:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507026#M136000</guid>
      <dc:creator>bentleyj1</dc:creator>
      <dc:date>2018-10-23T21:51:54Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507027#M136001</link>
      <description>&lt;P&gt;To counter -&lt;SPAN&gt;The issue with a _temporary_ array is that the values are retained-- "Temporary data element values are always automatically retained, rather than being reset to missing at the beginning of the next iteration of the DATA step."&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;array temp(9999) _temporary_;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;call missing(of temp(*));&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;at the top of each iteration&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 22:02:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507027#M136001</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-10-23T22:02:53Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507029#M136003</link>
      <description>&lt;P&gt;Brilliant.&amp;nbsp; Thanks.&amp;nbsp; I might go with the temporary array solution... it's got to be faster than adding 15 variables to each record and then dropping them at the end of the step.&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 22:05:26 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507029#M136003</guid>
      <dc:creator>bentleyj1</dc:creator>
      <dc:date>2018-10-23T22:05:26Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507030#M136004</link>
      <description>&lt;P&gt;Yes Sir. i concur with&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13879"&gt;@Reeza&lt;/a&gt;&amp;nbsp;'s idea too as no output buffer is created and the variables are contiguous and so easy to peek/compute. Let us know if you need any further help.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 22:09:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507030#M136004</guid>
      <dc:creator>novinosrin</dc:creator>
      <dc:date>2018-10-23T22:09:28Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507033#M136005</link>
      <description>&lt;P&gt;How did you determine the LAG functions are causing the performance issue?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If you are executing the LAG&amp;nbsp;or any function&amp;nbsp;unnecessarily then not doing should improve performance.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Will the LAGs be right with when the are executed as you suggest?&amp;nbsp; i.e. the conditional LAG problem.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 23 Oct 2018 22:15:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507033#M136005</guid>
      <dc:creator>data_null__</dc:creator>
      <dc:date>2018-10-23T22:15:09Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507034#M136006</link>
      <description>To me, you'd need to reset it at the beginning of each new ID wouldn't you?If you're only going up to 5 though and that makes sense you can just overwrite the numbers can't you. You can easily set it to missing with CALL MISSING().</description>
      <pubDate>Tue, 23 Oct 2018 22:25:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507034#M136006</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2018-10-23T22:25:39Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507044#M136009</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;1. Arrays&amp;nbsp;are a good option to explore&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;2. Your suggestion&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;if loan_count=2 then do;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;BR /&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; max_dpd=max(dpd,lag(dpd));&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;would&lt;SPAN&gt;&amp;nbsp;not work. The LAG stack must be updated with each observation. The IF test prevents&amp;nbsp;that.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;3. I don't think you need&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;FONT face="courier new,courier"&gt;&lt;SPAN&gt;&amp;nbsp; else do;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; chrva421p=.;&lt;BR /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; end;&lt;/SPAN&gt;&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;4. Temporary arrays are much faster than named ones.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 00:58:05 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507044#M136009</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-10-24T00:58:05Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507057#M136012</link>
      <description>&lt;P&gt;One more thought ... it could be that the LAG function is slowing the program.&amp;nbsp; Note that these two programs generate identical resrults:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want1;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;pay_lag1 = lag(payment);&lt;/P&gt;
&lt;P&gt;pay_lag2 = lag2(payment);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want2;&lt;/P&gt;
&lt;P&gt;pay_lag2 = pay_lag1;&lt;/P&gt;
&lt;P&gt;pay_lag1 = payment;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;retain pay_lag1;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's less work to do this with numeric variables.&amp;nbsp; With character variables, you would have to set the length first.&amp;nbsp; But it could well be significantly faster than using LAG.&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 03:17:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507057#M136012</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-10-24T03:17:54Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507059#M136014</link>
      <description>&lt;P&gt;Maybe two thoughts that come to my mind to complement what have been said:&lt;/P&gt;&lt;P&gt;- it looks like you only need to determine the &lt;EM&gt;n-1&lt;/EM&gt; lag of the current &lt;EM&gt;loan_Count &lt;/EM&gt;if-clause&lt;EM&gt;&lt;BR /&gt;&lt;EM&gt;- &lt;/EM&gt;&lt;/EM&gt;You may want to determine if you can make use of the &lt;A href="http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a001500739.htm" target="_blank"&gt;SASFILE&lt;/A&gt; Statement,&amp;nbsp;maybe by exploring if you can split up the huge dataset upon performing the operations by time period and account number.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 06:49:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507059#M136014</guid>
      <dc:creator>Oligolas</dc:creator>
      <dc:date>2018-10-24T06:49:31Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507070#M136023</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;One more thought ... it could be that the LAG function is slowing the program.&amp;nbsp; Note that these two programs generate identical resrults:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want1;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;pay_lag1 = lag(payment);&lt;/P&gt;
&lt;P&gt;pay_lag2 = lag2(payment);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want2;&lt;/P&gt;
&lt;P&gt;pay_lag2 = pay_lag1;&lt;/P&gt;
&lt;P&gt;pay_lag1 = payment;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;retain pay_lag1;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's less work to do this with numeric variables.&amp;nbsp; With character variables, you would have to set the length first.&amp;nbsp; But it could well be significantly faster than using LAG.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;I just ran a similar test against one of our larger datasets (100+ million obs, ~14 G compressed size):&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data ww0.test2 (keep=x1 x2);
set source;
length x1 x2 4;
x1 = invar;
x2 = lag(invar);
run;

data ww0.test3 (keep=x1 x2);
set source;
length x1 x2 4;
retain x2;
x1 = invar;
output;
x2 = invar;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;In terms of CPU time, both methods performed similar, with only marginal differences. Outside factors (concurrent load, especially I/O) was much more significant with regards to real time needed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;My personal bottom line: I expect a Base SAS function (and one that's been around as long as lag()) to be optimized as much as can be, and not to be outdone by manually written code that does essentially the same (fill and move a FIFO chain)&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 07:42:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507070#M136023</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-10-24T07:42:06Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507095#M136037</link>
      <description>&lt;P&gt;I think the temporary array idea can work, and I do not think you have to initialize the array for each account. Just keep track of the number of records you have read from that account, something like&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  array dpd_lag(100) 8 _temporary_;
  array pay_lag(100) 8 _temporary_;
  array prin_lag(100) 8 _temporary_;
  do _N_=1 by 1 until(last.account_number);
    set have;
    by account_number;
    /* assuming that you want to do the calculation for loan_count=1 as well, with no lag, I put the current variables in also */
    dpd_lag(_N_)=dpd;
    pay_lag(_N_)=payment_amount;
    prin_lag(_N_)=payment;
    if loan_count&amp;gt;_N_ then /* would be the same as getting the lag from a previous account */
      error 'Wrong loan count';
    else do;
      lag_index=_N_-loan_count+1;
      /* do the calculation here, using dpd_lag(lag_index) etc. as the lagged vars */
      /* I assume the calculation is basically the same for all loan counts */
      output;
      end;
    end;
run;
    &lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Wed, 24 Oct 2018 10:45:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507095#M136037</guid>
      <dc:creator>s_lassen</dc:creator>
      <dc:date>2018-10-24T10:45:30Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507341#M136152</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;My experience differs, using LAG() takes 40% longer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data TEST; 
  do PAYMENT = 1 to 1e8;
    output;
   end;
run;

sasfile TEST load;
 
data _null_;        * 7s CPU/Elapse;
  set TEST;
  PAY_LAG1 = lag (PAYMENT);
  PAY_LAG2 = lag2(PAYMENT);
run;

data _null_;        * 5s CPU/Elapse;
  retain PAY_LAG1;
  PAY_LAG2 = PAY_LAG1;
  PAY_LAG1 = PAYMENT;
  set TEST;
run;

sasfile TEST close;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;gt;&amp;nbsp;I expect a Base SAS function (and one that's been around as long as lag()) to be optimized as much as can be&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Considering&lt;SPAN&gt;&amp;nbsp;how badly functions&amp;nbsp;perform when used in WHERE clauses compared to IF clauses, my optimism regarding optimisation doesn't reach the levels of yours. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 24 Oct 2018 22:25:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507341#M136152</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-10-24T22:25:37Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507408#M136201</link>
      <description>&lt;P&gt;Interesting. This is the log when I ran your test on my server:&lt;/P&gt;
&lt;PRE&gt;35         data _null_;        * 7s CPU/Elapse;
36           set TEST;
37           PAY_LAG1 = lag (PAYMENT);
38           PAY_LAG2 = lag2(PAYMENT);
39         run;

NOTE: There were 100000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           44.22 seconds
      cpu time            5.75 seconds
      

40         
41         data _null_;        * 5s CPU/Elapse;
2                                                          Das SAS System                           08:26 Thursday, October 25, 2018

42           retain PAY_LAG1;
43           PAY_LAG2 = PAY_LAG1;
44           PAY_LAG1 = PAYMENT;
45           set TEST;
46         run;

NOTE: There were 100000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           46.69 seconds
      cpu time            4.88 seconds
&lt;/PRE&gt;
&lt;P&gt;SAS 9.4M5, AIX 7.1, 2 POWER8 cores, MEMSIZE 256M (that's why SASFILE won't work).&lt;/P&gt;
&lt;P&gt;All this with a rather constant run queue around 8, and one can see that the potential gain in real time is negligible (actually, in this particular test there wasn't one), as the factor I/O is much bigger.&lt;/P&gt;
&lt;P&gt;When reducing the obs count to 1e7, sasfile was possible, and I got this:&lt;/P&gt;
&lt;PRE&gt;35         data _null_;        * 7s CPU/Elapse;
36           set TEST;
37           PAY_LAG1 = lag (PAYMENT);
38           PAY_LAG2 = lag2(PAYMENT);
39         run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           3.97 seconds
      cpu time            0.54 seconds
      

40         
41         data _null_;        * 5s CPU/Elapse;
42           retain PAY_LAG1;
43           PAY_LAG2 = PAY_LAG1;
44           PAY_LAG1 = PAYMENT;
45           set TEST;
46         run;

NOTE: There were 10000000 observations read from the data set WORK.TEST.
NOTE: DATA statement used (Total process time):
      real time           3.53 seconds
      cpu time            0.44 seconds
&lt;/PRE&gt;
&lt;P&gt;Once again, the difference in CPU seconds lies in the 10-15% range.&lt;/P&gt;
&lt;P&gt;What environment do you use for SAS?&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 06:44:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507408#M136201</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-10-25T06:44:32Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507413#M136205</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;I am testing under Windows. I can post the log tomorrow, but there is nothing to be learnt there I think.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&amp;gt;as the factor I/O is much bigger&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;The idea is to test whether retrieving previous values with LAG() is slower than saving values before they are overwritten by SET.&lt;/P&gt;
&lt;P&gt;It may well be that, depending on other parameters, this difference in efficiency makes no difference in various cases.&lt;/P&gt;
&lt;P&gt;That is besides the point. That's why I used &lt;FONT face="courier new,courier"&gt;data _null_&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;sasfile &lt;/FONT&gt;: to isolate the process we are measuring.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;Once again, the difference in CPU seconds lies in the 10-15% range.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Your numbers show an increase in CPU time when using LAG of 17 and 22%. Not as high as my test (maybe due to the high load you have?), but enough to conclude that LAG is less efficient imho.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;MEMSIZE 256M&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Oh wow that's low! So much CPU power so little RAM space!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 07:27:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507413#M136205</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-10-25T07:27:42Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507419#M136208</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&lt;/P&gt;
&lt;BR /&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;MEMSIZE 256M&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;Oh wow that's low! So much CPU power so little RAM space!&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;That's following a recommendation by SAS themselves. If you allow SAS lots of memory in a heavily multi-user setup, SAS will (in most cases with data steps and "simple" procedures that do not need much memory for themselves) "waste" memory by caching SAS dataset data internally, when the operating system already does that on it's own. In extreme, the OS would then have to page out program pages that actually contain data which is already present in the portion the OS reserves for caching disk data. Double whammy. Restricting SAS memory prevents that and makes the most of the operating system's capabilities in terms of read-ahead, sharing cached data and the like.&lt;/P&gt;
&lt;P&gt;So we work with the least memory that's necessary for our codes to work.&lt;/P&gt;
&lt;P&gt;At the moment I ran my tests, we had our monthly closure running, so the accountants and actuaries where riding the server hardest, pushing up the run queue and the I/O load.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The batch programs I develop are also meant to run in parallel, so using memory economically is also a must.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 08:23:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507419#M136208</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-10-25T08:23:54Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507470#M136230</link>
      <description>&lt;P&gt;Remember, in this application there are several variables that need to be lagged 5 times.&amp;nbsp; This might be a more appropriate test of the savings:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want1;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;pay_lag1 = lag(payment);&lt;/P&gt;
&lt;P&gt;pay_lag2 = lag2(payment);&lt;/P&gt;
&lt;P&gt;pay_lag3 = lag3(payment);&lt;/P&gt;
&lt;P&gt;pay_lag4 = lag4(payment);&lt;/P&gt;
&lt;P&gt;pay_lag5 = lag5(payment);&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want2;&lt;/P&gt;
&lt;P&gt;pay_lag5 = pay_lag4;&lt;/P&gt;
&lt;P&gt;pay_lag4 = pay_lag3;&lt;/P&gt;
&lt;P&gt;pay_lag3 = pay_lag2;&lt;/P&gt;
&lt;P&gt;pay_lag2 = pay_lag1;&lt;/P&gt;
&lt;P&gt;pay_lag1 = payment;&lt;/P&gt;
&lt;P&gt;set have;&lt;/P&gt;
&lt;P&gt;retain pay_lag1 - pay_lag5;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Yes, I know I don't need to retain PAY_LAG5.&amp;nbsp; But that can only help the program run a hair faster.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 13:05:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507470#M136230</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-10-25T13:05:17Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507479#M136236</link>
      <description>&lt;P&gt;With 5 lags vs. 5 retains, the difference becomes significant. With 3e7 observations I got&lt;/P&gt;
&lt;PRE&gt;      real time           3.83 seconds
      cpu time            2.13 seconds
&lt;/PRE&gt;
&lt;P&gt;for the lag() vs.&lt;/P&gt;
&lt;PRE&gt;      real time           2.32 seconds
      cpu time            1.29 seconds
&lt;/PRE&gt;
&lt;P&gt;for the retain version. All tests ran from a dataset loaded with sasfile and used in a data _null_ step, so I/O was mostly removed as a factor.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I hereby revoke my former statement about SAS functions being optimized.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I then added a step with a temporary array:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
set have;
array pay_lag{5} _temporary_;
do i = 1 to 4;
  pay_lag{i} = pay_lag{i+1};
end;
pay_lag{5} = payment;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;and this turned out to be the slowest of all, more than double the CPU time of the retain version.&lt;/P&gt;
&lt;P&gt;Who's the culprit here? It turns out to be the do loop, because (after digging out the old C programmer in me who uses #DEFINEs to speed up the compilate)&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;
set have;
array pay_lag{5} _temporary_;
%macro mymac;
%do i = 1 %to 4;
  pay_lag{&amp;amp;i} = pay_lag{%eval(&amp;amp;i+1)};
%end;
%mend;
%mymac
pay_lag{5} = payment;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;performed as well as the retain version.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Bottom line: if you engage in optimizing code in this way, creating lots of statements with a macro can save significantly in runtime.&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 13:38:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507479#M136236</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2018-10-25T13:38:31Z</dc:date>
    </item>
    <item>
      <title>Re: Lag performance over 400 million sorted records</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507585#M136272</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/11562"&gt;@Kurt_Bremser&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1.&lt;EM&gt; if you engage in optimizing code in this way, creating lots of statements with a macro can save significantly in runtime.&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Top notch optimisation research here Kurt. Well done! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;2.&lt;EM&gt;SAS will "waste" memory by caching SAS dataset data internally, when the operating system already does that on it's own&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- I've never heard of this before. As far as I know, a step allocates&amp;nbsp;virtual memory based on best guess upon starting, requests more as/if needed until the value of ~MEMSIZE is reached, and then uses utility&amp;nbsp;files if that's a feature supported by that step. At the end of the step, all the virtual memory is released.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;- The latest value &lt;A href="https://documentation.sas.com/?docsetId=hostunx&amp;amp;docsetTarget=n09y5anvvpzrmnn0ztkyf59qgzvr.htm&amp;amp;docsetVersion=9.4&amp;amp;locale=en" target="_self"&gt;seems to be a more reasonable 2G&lt;/A&gt; (it was &lt;A href="http://support.sas.com/documentation/cdl/en/hostunx/63053/HTML/default/viewer.htm#n09y5anvvpzrmnn0ztkyf59qgzvr.htm" target="_self"&gt;512M in version 9.3&lt;/A&gt;).&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Oct 2018 21:07:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Lag-performance-over-400-million-sorted-records/m-p/507585#M136272</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2018-10-25T21:07:57Z</dc:date>
    </item>
  </channel>
</rss>

