<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: working with large data sets in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98774#M27795</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;seems to be working, need to run some more checks&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Mon, 30 Apr 2012 18:04:28 GMT</pubDate>
    <dc:creator>skipper</dc:creator>
    <dc:date>2012-04-30T18:04:28Z</dc:date>
    <item>
      <title>working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98763#M27784</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;HI, I have code to compute 'N' means, in my code N=200.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data data_set (drop = i);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 200;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X = rand('NORMAL',0,1);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;select mean(x) into :mx from data_set;&lt;/P&gt;&lt;P&gt;select count(x) into :cx from data_set;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;%macro compute;&lt;/P&gt;&lt;P&gt;data data_set;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set data_set;&lt;/P&gt;&lt;P&gt;&amp;nbsp; %do n=1 %to &amp;amp;cx -1 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; r&amp;amp;n = (X - &amp;amp;mx)*(lag&amp;amp;n(X) - &amp;amp;mx) ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; %end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;%mend;&lt;/P&gt;&lt;P&gt;%compute&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;PROC MEANS DATA=data_set mean;&lt;/P&gt;&lt;P&gt;OUTPUT OUT=want;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;**************************************&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;BUT, in my real data, N=5,000,000 !!&lt;/P&gt;&lt;P&gt;(and the above code becomes to slow with N&amp;gt;20,000.)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;How can this be done on such large 'N' ?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 13:31:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98763#M27784</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T13:31:33Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98764#M27785</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Skipper,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Tell us what you are trying to do.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This looks like you are creating "N" new columns in data_set.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;%do n=1 %to &amp;amp;cx -1 ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; r&amp;amp;n = (X - &amp;amp;mx)*(lag&amp;amp;n(X) - &amp;amp;mx) ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; %end;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If my reading of your code is correct, SAS will not create a dataset with 5,000,000 columns, so this approach won't work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;You will probably be better off creating a dataset with lots of short rows and grouping somehow, but without knowing more about the goal, it is hard to help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Doc Muhlbaier&lt;/P&gt;&lt;P&gt;Duke&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 13:56:35 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98764#M27785</guid>
      <dc:creator>Doc_Duke</dc:creator>
      <dc:date>2012-04-30T13:56:35Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98765#M27786</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Trying to compute the 'autocorrelations' for time series data set, 5 million observations of a single variable X.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;R(T) = SUM ( X - mean(X) ) * ( LagT(X) - Mean(X) ) / (N - T)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;must be computed and summed for each lag times T=1,2,3,4,.....,N. hence the 5million columns&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I dont want to create 5,000,000 columns, its my poor coding knowledge of a better way.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;but i need the compute the above equation R(1), R(2), .... R(5,000,000).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;So all i really need is just a two column data set with variables 'X' and&amp;nbsp; 'R(T)' with 5 million observations.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any ideas how this could be done?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 16:49:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98765#M27786</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T16:49:58Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98766#M27787</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Do you have SAS ETS?&lt;/P&gt;&lt;P&gt;Can you use proc timeseries and then deal with the output from there rather than a datastep and macro code?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:11:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98766#M27787</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2012-04-30T17:11:33Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98767#M27788</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;thanks it seems&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data data_set (drop = i);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 200;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X = rand('NORMAL',0,1);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;data data_set;&lt;/P&gt;&lt;P&gt;set data_Set;&lt;/P&gt;&lt;P&gt;obsN = _N_;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc timeseries data=data_set&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; out=out&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; outcorr=timedomain;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; id obsN interval=day;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var X;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;works well, but it only gives the first 20 rows for the outcorr=timedomain whereas I need all of the rows.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I need it to find R(1), R(2), ... R(200), &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Is there some way to specifiy how many rows the 'outcorr=&amp;lt;&amp;gt;' creats in the dataset.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:33:38 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98767#M27788</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T17:33:38Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98768#M27789</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;skipper,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;First, you have to realize you will need some powerful hardware.&amp;nbsp; This problem requires trillions of computations.&amp;nbsp; Assuming you have room to store the results, this approach creates a data set with trillions of observations but only 2 variables:&amp;nbsp; T and the contribution to R(T).&amp;nbsp; From that point, the data will need to get summed up.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Compute your macro variables as before (&amp;amp;MX and &amp;amp;CX).&amp;nbsp; Then take each relevant pair of observations:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (keep=T RT);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set have (keep=x);&lt;BR /&gt;&amp;nbsp;&amp;nbsp; if _n_ &amp;gt; 1 then do i = 1 to _n_ - 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have (rename=(x = lagged_x)) point=i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; T = _n_ - i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = (x - &amp;amp;mx) * (lagged_x - &amp;amp;mx) / (&amp;amp;cx - T);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I think I got the formulas right.&amp;nbsp; Naturally, you'll want to test this on a small data set first!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:36:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98768#M27789</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T17:36:21Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98769#M27790</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;An important afterthought ...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;If all you need is the final sum, you don't have to store the trillions of observations.&amp;nbsp; Instead:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (keep=T RT) / view=want;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The rest of the DATA step is the same, then&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc means data=want sum;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var RT;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Or, add an OUTPUT statement to the PROC MEANS to save the result in a data set.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But this might not be applicable if you need to save the separate R(T) for each value of T.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:40:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98769#M27790</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T17:40:58Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98770#M27791</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;looks interesting,... but&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data data_set (drop = i);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 200;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X = rand('NORMAL',0,1);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;select mean(x) into :mx from data_set;&lt;/P&gt;&lt;P&gt;select count(x) into :cx from data_set;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (keep=T RT);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set data_set (keep=x);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if _n_ &amp;gt; 1 then do i = 1 to _n_ - 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set data_set (rename=(x = lagged_x)) point=i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; T = _n_ - i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = (x - &amp;amp;mx) * (lagged_x - &amp;amp;mx) / (&amp;amp;cx - T);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;creates a table with 19,900 observations!..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;it should only have 200 observations, vector X has size 200, so it should be T = 1, 2, 3, 4, .... 200 and R(1), R(2), ... R(200).&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:48:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98770#M27791</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T17:48:25Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98771#M27792</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Yes, that sounds about right.&amp;nbsp; This is getting the contribution to R(T) for each pair of observations.&amp;nbsp; You still need to sum it up.&amp;nbsp; For your small data set, you could use:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc means data=want sum;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var RT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; class T;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;With larger numbers of T values, it won't be so easy.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:53:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98771#M27792</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T17:53:10Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98772#M27793</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;I dont understand, like&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data data_set (drop = i);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 200;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X = rand('NORMAL',0,1);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;select mean(x) into :mx from data_set;&lt;/P&gt;&lt;P&gt;select count(x) into :cx from data_set;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want2 (keep=T RT) / view=want2;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set data_set (keep=x);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if _n_ &amp;gt; 1 then do i = 1 to _n_ - 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set data_set (rename=(x = lagged_x)) point=i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; T = _n_ - i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = (x - &amp;amp;mx) * (lagged_x - &amp;amp;mx) / (&amp;amp;cx - T);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc means data=want2 sum;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var RT;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;produces only 123 rows? it should be R(1) to R(200)?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 17:56:36 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98772#M27793</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T17:56:36Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98773#M27794</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;With 200 records, wouldn't you expect 199 values for T?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But this PROC MEANS should generate only 1 row, with the sum of RT across all values of T.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 18:01:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98773#M27794</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T18:01:27Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98774#M27795</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;seems to be working, need to run some more checks&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 18:04:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98774#M27795</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T18:04:28Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98775#M27796</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;so this code&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data data_set (drop = i);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do i = 1 to 20000;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; X = rand('NORMAL',0,1);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc sql;&lt;/P&gt;&lt;P&gt;select mean(x) into :mx from data_set;&lt;/P&gt;&lt;P&gt;select count(x) into :cx from data_set;&lt;/P&gt;&lt;P&gt;quit;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want (keep=T RT) / view=want;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set data_set (keep=x);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if _n_ &amp;gt; 1 then do i = 1 to _n_ - 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set data_set (rename=(x = lagged_x)) point=i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; T = _n_ - i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = (x - &amp;amp;mx) * (lagged_x - &amp;amp;mx) / (&amp;amp;cx - T);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc means data=want sum;&lt;/P&gt;&lt;P&gt;OUTPUT OUT=want3;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var RT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; class T;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want4;&lt;/P&gt;&lt;P&gt;set want3;&lt;/P&gt;&lt;P&gt;where _STAT_="MEAN" and T &amp;gt; 0;&lt;/P&gt;&lt;P&gt;keep T RT;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;gives me what I want, but it took 6 minutes to compute, and thats only for 20,000 observations, which is not large enough I need at least 2 million.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 18:21:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98775#M27796</guid>
      <dc:creator>skipper</dc:creator>
      <dc:date>2012-04-30T18:21:32Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98776#M27797</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;It will take a while no matter how you do it.&amp;nbsp; Remember how 200 observations morphed into 19,900?&amp;nbsp; You are getting similar expansion for the 20,000 records.&amp;nbsp; The LOG indicates how many observations are being read in by PROC MEANS.&amp;nbsp; As the number of observations increases, PROC MEANS will eventually fail because there will be too many values of T.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Note that you should modify your final PROC MEANS to eliminate the need for a subsequent DATA step:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;proc means data=want NWAY;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; var RT;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; class T;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; output out=want3 (keep=T RT) sum=;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But that is not the time-consuming part.&amp;nbsp; The number of calculations increases roughly proportional to the square of the number of observations.&amp;nbsp; Like I said, you will end up in the trillions by the time you are done.&amp;nbsp; I don't know if sorting that many observations would be possible ... if so, it would allow PROC MEANS to switch from a CLASS to a BY statement, which would allow PROC MEANS to run.&amp;nbsp; How long it would take is anybody's guess.&amp;nbsp; But sorting the trillions of records would certainly take longer than running PROC MEANS.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 18:35:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98776#M27797</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T18:35:54Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98777#M27798</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;This program ran without error for me:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data _null_;&lt;/P&gt;&lt;P&gt;array t {5000000};&lt;/P&gt;&lt;P&gt;x=1;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It took about 2 minutes, but it ran.&amp;nbsp; That gives you an alternative way to sum up the pieces from the view named WANT:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data final;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; array t_totals {4999999};&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set want end=done;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; t_totals{t} + rt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if done then do T=1 to 4999999;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = t_totals{T};&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; keep T RT;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This will still crank for a while since it has to process trillions of records.&amp;nbsp; But if the top DATA step works, the bottom one will work as well.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good luck!&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 18:52:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98777#M27798</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T18:52:52Z</dc:date>
    </item>
    <item>
      <title>Re: working with large data sets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98778#M27799</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;OK, final afterthought.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;It should be possible to combine all the processing into a single DATA step (if the array will fit in memory).&amp;nbsp; Defining a temporary array will speed things up:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data want;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; array t_totals {4999999} _temporary_;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; set have (keep=x) end=done;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if _n_ &amp;gt; 1 then do i = 1 to _n_ - 1;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; set have (keep=x rename=(x = lagged_x)) point=i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; T = _n_ - i;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; t_totals{T} + (x - &amp;amp;mx) * (lagged_x - &amp;amp;mx) / (&amp;amp;cx - T);&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; if done;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; do T = 1 to 4999999;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; RT = t_totals{T};&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; end;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp; keep T RT;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;OK, I'm pretty sure I'm done here and can't speed it up beyond that.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Mon, 30 Apr 2012 19:13:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/working-with-large-data-sets/m-p/98778#M27799</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2012-04-30T19:13:44Z</dc:date>
    </item>
  </channel>
</rss>

