<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683481#M207030</link>
    <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/345149"&gt;@Array_Mover&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I agree for the small sample data set, but across 100 million rows of transaction data, it is just not reasonable.&amp;nbsp; While researching, I discovered hash tables and thought that would be the perfect solution for lighting fast lookups and calculations at such a scale.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Not necessarily. You don't need all the rows at once for this logic to work. The minimum you need is all the rows for a given account at once.&lt;/P&gt;
&lt;P&gt;So you could process the table say one hundredth of the table (say all accounts between 1 and 1000000, 1000001 and 2000000, etc) at a time and still get all the results you need.&lt;/P&gt;
&lt;P&gt;If you have 20 transactions per account on average, that's a 20-million row table, which the SQL parser might well decide to load into a hash table. Or not.&lt;/P&gt;
&lt;P&gt;In any case, not complex programming, no &lt;EM&gt;interpreted&lt;/EM&gt; vs &lt;EM&gt;compiled&lt;/EM&gt; inefficiencies, and since it seems you are going to load partial hash tables, you might be better off loading partial tables into SQL instead.&lt;/P&gt;</description>
    <pubDate>Sun, 13 Sep 2020 02:13:40 GMT</pubDate>
    <dc:creator>ChrisNZ</dc:creator>
    <dc:date>2020-09-13T02:13:40Z</dc:date>
    <item>
      <title>Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683320#M206948</link>
      <description>&lt;P&gt;I will try and describe my problem in pseudo-code.&amp;nbsp; I am a self-taught programmer, so I apologize in advance for any gaps in my vocabulary and knowledge.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have transaction data across accounts down to the second.&amp;nbsp; The problem with every moving average answer I have looked up is, the data is always very structured and designed in a way to make the calculation relatively easy.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The transaction data is difficult, because it can vary wildly depending on the account.&amp;nbsp; An account might have 10 transactions in a 24 hour period, or 5 transactions over 10 days.&amp;nbsp; I need to be able to check and see if the previous transactions were within 24 hours, and if not, do not count them.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have created a fake dataset with dates that are commensurate with the problems I am facing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
    input customerid : 7.
	transactionid: $13.
        date : datetime26.
        value : 3.
    ;
    format date datetime26.;

datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16

;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;So, for each account, I need to:&lt;/P&gt;&lt;P&gt;1)Loop through every transaction&lt;/P&gt;&lt;P&gt;2)On each transaction, review all previous transactions and compare the transaction date to the previous date and calculate how long ago it occurred&lt;/P&gt;&lt;P&gt;3)Count those transactions that qualify&lt;/P&gt;&lt;P&gt;4)Sum those transaction amounts that qualify&lt;/P&gt;&lt;P&gt;5)Divide to find average&lt;/P&gt;&lt;P&gt;6)Repeat&lt;/P&gt;&lt;P&gt;7)Reset when you get to a new account&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I believe the Hash Object will be the best route, because there could be tens of millions of rows.&amp;nbsp; I have begun reading 'Data Management Solutions Using SAS Hash Table Operations' but the concept is so foreign to me, I am having trouble grasping it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is what I have tried:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data _null_;

dcl hash transactions(dataset:"sample",multidata:"Y",ordered:"A");
 transactions.defineKey("customerid");
 transactions.defineData("Date","value");
 transactions.defineDone();
dcl hash accounts(dataset:"sample(rename=(date=base_date))",multidata:"Y",ordered:"A");
 accounts.defineKey("customerid");
 accounts.defineData("customerid","base_date");
 accounts.defineDone();
/* define the result hash object tables */
 dcl hash h_pointer;
 dcl hash byAccount(ordered:"A");
 byAccount.defineKey("customerid");
 byAccount.defineData("customerid","Date","value","_average");
 byAccount.defineDone();

if 0 then set sample;
format base_date datetime26.;

 lr = 0;
 do until(lr);
    set sample end = lr;
	call missing(customerid,date,base_date,value);
	accounts.find();
	transactions_rc = transactions.find();
	do while(transactions_rc=0);
		time_diff = intck('second',base_date,date);
		put time_diff;
		if (-86400 le intck('second',base_date,date) le 86400) then leave;
		transactions_rc = transactions.find_next();
	end;
	h_pointer = byAccount;
	link calculator;
end;

stop;
calculator:
	call missing(date,value,_average);
	rc = h_pointer.find();
	counts 			+1;
	h_pointer.replace();
return;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Right now, it is just trying to count.&amp;nbsp; Any help would be appreciated!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/21262"&gt;@hashman&lt;/a&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 11 Sep 2020 19:22:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683320#M206948</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-11T19:22:45Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683402#M206989</link>
      <description>&lt;P&gt;I believe the has object is a good fit. You might want the check out the section on stacks - a separate stack for each day - output (as appropriate) and clearing as you complete each day.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I do have a couple of questions about your requirements:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;3)Count those transactions that qualify&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;4)Sum those transaction amounts that qualify&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;5)Divide to find average&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;I am not sure I know what you mean by "transactions that qualify" as well as what to do with the results.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 03:23:20 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683402#M206989</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2020-09-12T03:23:20Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683403#M206990</link>
      <description>&lt;P&gt;SQL would be a lot more suitable.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql;
  create table WANT as
  select a.*, mean(b.VALUE)
  from HAVE a left join HAVE b
  on a.CUSTOMERID=b.CUSTOMERID and b.DATE between a.DATE and a.DATE-3600*24
  group by a.CUSTOMERID, a.DATE, a.VALUE, a.TRANSACTIONID
  order by a.CUSTOMERID, a.DATE, a.VALUE;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 03:26:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683403#M206990</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-09-12T03:26:02Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683433#M207007</link>
      <description>&lt;PRE&gt;data sample;
    input customerid : 7.
	transactionid: $13.
        date : datetime26.
        value : 3.
    ;
    format date datetime26.;
datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16
;
run;
data sample;
 set sample;
 date=int(date);
run;
/*Assuming there are not duplicated date within a customerid */
data want;
 if _n_=1 then do;
   if 0 then set sample;
   declare hash h();
   h.definekey('date');
   h.definedata('value');
   h.definedone();
 end;

h.clear();
do until(last.customerid);
 set sample;
 by customerid;
 h.add();
end;

array x{99999} _temporary_;
do until(last.customerid);
 set sample;
 by customerid;
 i=0;call missing(of x{*});
 do temp=intnx('hour',date,-24,'s') to date;
   	if h.find(key:temp)=0 then do;i+1;x{i}=value;end;
 end;
 rolling_mean=mean(of x{*});
 output;
end;

drop i temp;
run;&lt;/PRE&gt;</description>
      <pubDate>Sat, 12 Sep 2020 12:35:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683433#M207007</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-09-12T12:35:25Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683450#M207014</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;P&gt;&lt;SPAN style="font-family: Arial, sans-serif; font-size: 10pt;"&gt;3)Count those transactions that qualify&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;4)Sum those transaction amounts that qualify&lt;/SPAN&gt;&lt;/P&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;What I mean, if you look at the transactions in the sample data, is that as time passes, older transactions fall out of the 24 hour window.&amp;nbsp; That is what I meant when I said they would no longer "qualify". That they wouldn't be considered for the current line's 24 hour calculation.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="font-size: 10.0pt; font-family: 'Arial',sans-serif;"&gt;I have done the calculations by hand to see what I want the result table to look like.&amp;nbsp; I hope it helps:&lt;/SPAN&gt;&lt;/P&gt;&lt;TABLE&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD&gt;Row&lt;/TD&gt;&lt;TD&gt;customerid&lt;/TD&gt;&lt;TD&gt;transactionid&lt;/TD&gt;&lt;TD&gt;date&lt;/TD&gt;&lt;TD&gt;value&lt;/TD&gt;&lt;TD&gt;count&lt;/TD&gt;&lt;TD&gt;running_ave&lt;/TD&gt;&lt;TD&gt;Notes&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00355&lt;/TD&gt;&lt;TD&gt;15Jun2020 12:47:17&lt;/TD&gt;&lt;TD&gt;53&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;53.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00687&lt;/TD&gt;&lt;TD&gt;15Jun2020 12:48:35&lt;/TD&gt;&lt;TD&gt;72&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;62.5&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00181&lt;/TD&gt;&lt;TD&gt;15Jun2020 12:59:44&lt;/TD&gt;&lt;TD&gt;88&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;71.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00273&lt;/TD&gt;&lt;TD&gt;15Jun2020 16:43:46&lt;/TD&gt;&lt;TD&gt;96&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;77.3&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;5&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00803&lt;/TD&gt;&lt;TD&gt;16Jun2020 22:30:15&lt;/TD&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;25.0&lt;/TD&gt;&lt;TD&gt;More than 24 hours passed since the last transaction, a full reset&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;6&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00411&lt;/TD&gt;&lt;TD&gt;17Jun2020 7:22:18&lt;/TD&gt;&lt;TD&gt;75&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;50.0&lt;/TD&gt;&lt;TD&gt;Even though a "day" has passed, it is June 17th now, the previous transaction is still within a 24 hour window&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;7&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00191&lt;/TD&gt;&lt;TD&gt;17Jun2020 23:16:39&lt;/TD&gt;&lt;TD&gt;59&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;67.0&lt;/TD&gt;&lt;TD&gt;The transaction on row 5 now falls away, but the previous transaction on row 6 is within a 24 hour window&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;8&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00461&lt;/TD&gt;&lt;TD&gt;14Jul2020 9:35:32&lt;/TD&gt;&lt;TD&gt;22&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;22.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;9&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00532&lt;/TD&gt;&lt;TD&gt;21Jul2020 10:40:54&lt;/TD&gt;&lt;TD&gt;46&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;46.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;10&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00609&lt;/TD&gt;&lt;TD&gt;28Jul2020 9:58:30&lt;/TD&gt;&lt;TD&gt;36&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;36.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;11&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00504&lt;/TD&gt;&lt;TD&gt;28Jul2020 10:12:55&lt;/TD&gt;&lt;TD&gt;87&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;61.5&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;12&lt;/TD&gt;&lt;TD&gt;2133780&lt;/TD&gt;&lt;TD&gt;2133780_00488&lt;/TD&gt;&lt;TD&gt;31Jul2020 15:30:44&lt;/TD&gt;&lt;TD&gt;36&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;36.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;13&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00284&lt;/TD&gt;&lt;TD&gt;02Jun2020 9:03:55&lt;/TD&gt;&lt;TD&gt;44&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;44.0&lt;/TD&gt;&lt;TD&gt;A new CustomerId&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;14&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00734&lt;/TD&gt;&lt;TD&gt;06Jun2020 11:20:23&lt;/TD&gt;&lt;TD&gt;95&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;95.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;15&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00061&lt;/TD&gt;&lt;TD&gt;06Jun2020 13:40:22&lt;/TD&gt;&lt;TD&gt;29&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;62.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;16&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00540&lt;/TD&gt;&lt;TD&gt;07Jun2020 14:00:19&lt;/TD&gt;&lt;TD&gt;14&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;14.0&lt;/TD&gt;&lt;TD&gt;This is just outside of the 24 hour window of the previous transaction&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;17&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00616&lt;/TD&gt;&lt;TD&gt;08Jun2020 13:36:24&lt;/TD&gt;&lt;TD&gt;14&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;14.0&lt;/TD&gt;&lt;TD&gt;This is just inside of the 24 hour window of the previous transaction&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;18&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00002&lt;/TD&gt;&lt;TD&gt;08Jun2020 13:59:32&lt;/TD&gt;&lt;TD&gt;82&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;36.7&lt;/TD&gt;&lt;TD&gt;This is around 30 seconds from being outside the 24 hour window of the previous 2 transactions.&amp;nbsp; This is how granular I want it to get&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;19&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00966&lt;/TD&gt;&lt;TD&gt;04Jul2020 11:29:49&lt;/TD&gt;&lt;TD&gt;21&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;21.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;20&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00859&lt;/TD&gt;&lt;TD&gt;10Jul2020 14:46:37&lt;/TD&gt;&lt;TD&gt;34&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;34.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;21&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00816&lt;/TD&gt;&lt;TD&gt;29Jul2020 13:56:45&lt;/TD&gt;&lt;TD&gt;95&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;95.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;22&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00045&lt;/TD&gt;&lt;TD&gt;29Jul2020 15:12:19&lt;/TD&gt;&lt;TD&gt;42&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;68.5&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;23&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00481&lt;/TD&gt;&lt;TD&gt;02Aug2020 11:02:21&lt;/TD&gt;&lt;TD&gt;38&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;38.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;24&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00227&lt;/TD&gt;&lt;TD&gt;02Aug2020 11:03:41&lt;/TD&gt;&lt;TD&gt;42&lt;/TD&gt;&lt;TD&gt;2&lt;/TD&gt;&lt;TD&gt;40.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;25&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00354&lt;/TD&gt;&lt;TD&gt;02Aug2020 12:26:34&lt;/TD&gt;&lt;TD&gt;42&lt;/TD&gt;&lt;TD&gt;3&lt;/TD&gt;&lt;TD&gt;40.7&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;26&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00794&lt;/TD&gt;&lt;TD&gt;03Aug2020 9:10:43&lt;/TD&gt;&lt;TD&gt;73&lt;/TD&gt;&lt;TD&gt;4&lt;/TD&gt;&lt;TD&gt;48.8&lt;/TD&gt;&lt;TD&gt;A new day, but still within 24 hours of the previous 3 transactions&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD&gt;27&lt;/TD&gt;&lt;TD&gt;2689999&lt;/TD&gt;&lt;TD&gt;2689999_00857&lt;/TD&gt;&lt;TD&gt;20Aug2020 15:03:19&lt;/TD&gt;&lt;TD&gt;16&lt;/TD&gt;&lt;TD&gt;1&lt;/TD&gt;&lt;TD&gt;16.0&lt;/TD&gt;&lt;TD&gt;&amp;nbsp;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Sat, 12 Sep 2020 17:12:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683450#M207014</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-12T17:12:09Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683451#M207015</link>
      <description>&lt;P&gt;I agree for the small sample data set, but across 100 million rows of transaction data, it is just not reasonable.&amp;nbsp; While researching, I discovered hash tables and thought that would be the perfect solution for lighting fast lookups and calculations at such a scale.&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 17:17:29 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683451#M207015</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-12T17:17:29Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683453#M207016</link>
      <description>&lt;P&gt;Wow, this works flawlessly on the table I provided.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There could be duplicates in the date field, so I provided a 'transactionid' to show unique transactions. Fraudsters can really hammer merchants with quick transactions and if some transactions are processed in batches, the dates might match.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I can sort of wrap my head around your code.&amp;nbsp; It is leveraging the hash table to do a lookup.&amp;nbsp; If there were a duplicate date, would it just stop at the first instance of that date occurring?&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 17:39:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683453#M207016</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-12T17:39:27Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683456#M207017</link>
      <description>&lt;P&gt;If the data were sorted by timestamp/customerid, then I would go to the hash approach - in fact you would need a hash-of-hashes to enable tracking of a varying number of qualifying records over a varying number of customerid's.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But this data is sorted by customerid/timestamp&amp;nbsp; (where "date" is the misleading name for timestamp).&amp;nbsp; So let's assume it is impossible for there to be more than 400 transactions per customerid.&amp;nbsp; Then:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
    input customerid : 7.
	transactionid: $13.
        date : datetime26.
        value : 3.
    ;
    format date datetime26.;

datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16

;
run;

%let interval=24:00:00;

data rolling_recent_counts (drop= j _:);
  set sample;
  by customerid;
  array _dats{400} _temporary_;
  array _vals{400} _temporary_;
  if first.customerid then call missing(of _n,_value_sum,_jlast);

  _n+1;
  _value_sum+value;

  _dats{_n}=date;
  _vals{_n}=value;

  _jlast+0;  /*Lazy way to retain, and reset missing at first.customerid above to zero */
  do j=_jlast+1 by 1 while (_dats{j} &amp;lt; date - "&amp;amp;interval"t); 
    _value_sum = _value_sum - _vals{j};
	call missing(_dats{j},_vals{j});
	_jlast=j;
  end;
  n_values=_n - _jlast;
  avg_value=_value_sum/n_values;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The variable _N is the record number for the current customerid, initialized to 1 for each customer.&amp;nbsp; And _JLAST is the record number for the most recent &lt;EM&gt;&lt;STRONG&gt;non-qualifying&lt;/STRONG&gt;&lt;/EM&gt; record, so it is initialized to zero for each customer.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If a customer can have more than 400 records, then just change the size of the _VALS and _DATS arrays to accomodate the largest possible record count.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit comment:&amp;nbsp; I took out an extraneous statement with a PUT statement I was using to track the program logic.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also note you can set the interval however you want in the %let statement.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 18:12:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683456#M207017</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-09-12T18:12:19Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683457#M207018</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;&lt;P&gt;If the data were sorted by timestamp/customerid, then I would go to the hash approach - in fact you would need a hash-of-hashes to enable tracking of a varying number of qualifying records over a varying number of customerid's.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;&lt;P&gt;Actually the transaction data&amp;nbsp;&lt;U&gt;is&lt;/U&gt; sorted by timestamp/customerid in the database where I actually need to do the work.&amp;nbsp; I thought the sample data sorted by timestamp would be the best way to illustrate my problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I was introduced to the&amp;nbsp;concept on Hash of Hashes by watching a SAS training video on youtube Paul Dorfman and Don Henderson presented. I then checked the book out from a university library.&amp;nbsp; I have no formal training in software engineering or programming, so I find the concepts difficult to grasp. I feel like I have a very basic understanding, but am having trouble applying the concepts in the book to my own real world scenario.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Would you mind tackling the problem as if the data were sorted by&amp;nbsp;&lt;SPAN&gt;timestamp/customerid?&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 18:43:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683457#M207018</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-12T18:43:51Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683458#M207019</link>
      <description>&lt;P&gt;Also, someone else in the comments below mentioned how they would use hash of hashes if the date were sorted by&amp;nbsp;&lt;SPAN&gt;timestamp/customerid.&amp;nbsp; I wanted to say, in the actual transaction data I am using, the data is sorted in this manner.&amp;nbsp; I chose to present the sample data sorted by customerid/timestamp f&lt;/SPAN&gt;&lt;SPAN&gt;or the purpose of illustrating my problem.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 18:46:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683458#M207019</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-12T18:46:54Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683464#M207021</link>
      <description>&lt;P&gt;Mark makes a good point about this being a problem that can be done with the Hash of Hash approach. However, that approach requires that you can fit all the transactions for all the customers into memory. If you can't, you will have to do some sort of looping. Given that you have to implement looping thru customers (I'm assuming that each customer is processed independently), the benefit of the Hash of Hash approach (a separate hash object for each customer) may not be worth it (unless you also have lots of customers.&lt;BR /&gt;&lt;BR /&gt;There are multiple ways to do this looping. Before I suggest anything can you provide any details about how many customers you could have in your input file and a rough guesstimate as to the maximum number of transactions a customer may ?&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 19:48:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683464#M207021</guid>
      <dc:creator>DonH</dc:creator>
      <dc:date>2020-09-12T19:48:39Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683473#M207023</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt; :&amp;nbsp; I agree.&amp;nbsp; The only reason I would recommend hash-of-hashes would be if the data were sorted by timestamp/customerid.&amp;nbsp; Otherwise it's wasteful.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, even in that scenario the OP wouldn't need to keep &lt;EM&gt;&lt;STRONG&gt;all&lt;/STRONG&gt;&lt;/EM&gt; the records for all the customers in the hash objects - just the last 24 hours for each customer.&amp;nbsp; Of course, that memory savings would be at the expense of adding code to remove data items from a hash as they become stale.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The OP's task is logically equivalent to maintaining a portfolio of stocks with irregularly implemented BUYs and SELLs of one or another constituent of the portfolio, which is the context in which I wrote &lt;A href="https://www.lexjansen.com/mwsug/2018/SB/MWSUG-2018-SB-60.pdf" target="_self"&gt;From Stocks to Flows: Using SAS® Hash Objects for FIFO, LIFO, and other FO’s&lt;/A&gt;.&amp;nbsp; In particular, the OP's task is like a FIFO treatment of inventory.&lt;/P&gt;</description>
      <pubDate>Sat, 12 Sep 2020 22:50:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683473#M207023</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-09-12T22:50:23Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683481#M207030</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/345149"&gt;@Array_Mover&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I agree for the small sample data set, but across 100 million rows of transaction data, it is just not reasonable.&amp;nbsp; While researching, I discovered hash tables and thought that would be the perfect solution for lighting fast lookups and calculations at such a scale.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;Not necessarily. You don't need all the rows at once for this logic to work. The minimum you need is all the rows for a given account at once.&lt;/P&gt;
&lt;P&gt;So you could process the table say one hundredth of the table (say all accounts between 1 and 1000000, 1000001 and 2000000, etc) at a time and still get all the results you need.&lt;/P&gt;
&lt;P&gt;If you have 20 transactions per account on average, that's a 20-million row table, which the SQL parser might well decide to load into a hash table. Or not.&lt;/P&gt;
&lt;P&gt;In any case, not complex programming, no &lt;EM&gt;interpreted&lt;/EM&gt; vs &lt;EM&gt;compiled&lt;/EM&gt; inefficiencies, and since it seems you are going to load partial hash tables, you might be better off loading partial tables into SQL instead.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 02:13:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683481#M207030</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-09-13T02:13:40Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683487#M207034</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/13569"&gt;@DonH&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/16961"&gt;@ChrisNZ&lt;/a&gt;,&amp;nbsp;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/18408"&gt;@Ksharp&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I greatly appreciate all of you chiming in. Would love to myself ... but haven't time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp; &amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 04:09:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683487#M207034</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2020-09-13T04:09:40Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683490#M207036</link>
      <description>&lt;P&gt;Here's a hash-of-hashes solution:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data sample;
  input customerid : 7.  transactionid: $13.
        date : datetime26.  value : 3. ;
    format date datetime26.;
datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000   53
2133780 2133780_00687 15JUN2020:12:48:35.000000   72
2133780 2133780_00181 15JUN2020:12:59:44.000000   88
2133780 2133780_00273 15JUN2020:16:43:46.000000   96
2133780 2133780_00803 16JUN2020:22:30:15.000000   25
2133780 2133780_00411 17JUN2020:07:22:18.000000   75
2133780 2133780_00191 17JUN2020:23:16:39.000000   59
2133780 2133780_00461 14JUL2020:09:35:32.000000   22
2133780 2133780_00532 21JUL2020:10:40:54.000000   46
2133780 2133780_00609 28JUL2020:09:58:30.000000   36
2133780 2133780_00504 28JUL2020:10:12:55.000000   87
2133780 2133780_00488 31JUL2020:15:30:44.000000   36
2689999 2689999_00284 02JUN2020:09:03:55.000000   44
2689999 2689999_00734 06JUN2020:11:20:23.000000   95
2689999 2689999_00061 06JUN2020:13:40:22.000000   29
2689999 2689999_00540 07JUN2020:14:00:19.000000   14
2689999 2689999_00616 08JUN2020:13:36:24.000000   14
2689999 2689999_00002 08JUN2020:13:59:32.000000   82
2689999 2689999_00966 04JUL2020:11:29:49.000000   21
2689999 2689999_00859 10JUL2020:14:46:37.000000   34
2689999 2689999_00816 29JUL2020:13:56:45.000000   95
2689999 2689999_00045 29JUL2020:15:12:19.000000   42
2689999 2689999_00481 02AUG2020:11:02:21.000000   38
2689999 2689999_00227 02AUG2020:11:03:41.000000   42
2689999 2689999_00354 02AUG2020:12:26:34.000000   42
2689999 2689999_00794 03AUG2020:09:10:43.000000   73
2689999 2689999_00857 20AUG2020:15:03:19.000000   16
run;

proc sort;
  by date customerid;
run;


data want (drop=_:);
  set sample;

  if _n_=1 then do;
    declare hash hoh ();
      hoh.definekey('customerid');
      hoh.definedata('customerid','h','ih','_value_sum','nvalues');
      hoh.definedone();
    declare hash h;
    declare hiter ih;
  end;

  /* New customerid?  Initialize a new hash &amp;amp; iterator. */
  if hoh.find()^=0 then do;
    h=_new_ hash(ordered:'a',multidata:'Y');  /*Added the MULTIDATA option*/
      h.definekey('date');
      h.definedata('date','value');
      h.definedone();
    ih=_new_ hiter('h');
    _value_sum=0;
    nvalues=0;
  end;
  h.add();

  _value_sum = _value_sum + value;
  nvalues=nvalues+1;

  _stale_timestamp = date - '24:00:00't;
  format _stale_timestamp datetime20.;

  array _drop_dates {0:100} ;
 
  /* Adjust _value_sum and nvalues for all stale timestamps */
  /* Also capture their dates to guide stale dataitems deletions*/
  do _rc=ih.first() by 0 while(date&amp;lt;_stale_timestamp);
    _value_sum = _value_sum-value;
    nvalues = nvalues -1;
    _ndd=n(of _drop_dates{*});
	if _drop_dates{_ndd}^=date then _drop_dates{_ndd+1}=date;
	_rc=ih.next();
  end;

  /* Replace updated _value_sum and nvalues in hoh */
  hoh.replace();

  /* Now remove any stale timestamps from h */
  _ndd=n(of _drop_dates{*});
  if _ndd&amp;gt;0 then do _d=1 to _ndd;
    h.remove(key:_drop_dates{_d});
    _drop_dates{_d}=.;
  end;

  average=_value_sum/nvalues;
  ih.last();  /*Retrieve date and value of most recent record*/
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Using the multidata:"Y" option alleviates the need to use transactionid as a hash key for storage purposes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;If there are duplicate dates they all appear in the hash object, but each date only appears once in the _dropdates array.&amp;nbsp; That's not a problem because the remove method will remove all duplicates for the specific key.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also this assumes that there are no more than 100 records in any 24 hour period for a given customerid.&amp;nbsp; If there can be more then modifiy the upper bound of _dropdates.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;EDITTED NOTE:&amp;nbsp; I had originally forgotten to insert the multidata:'Y' option in the "h=_new_ hash ..." statement - even though I explicitly mentioned it in my comments.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;
&lt;P&gt;&lt;EM&gt;&lt;STRONG&gt;It's there now, which will eliminate the ERROR messages for "duplicate keys" and (more importantly) properly account for duplicate date's in the running 24-hr average calculations.&lt;/STRONG&gt;&lt;/EM&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 14 Sep 2020 17:52:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683490#M207036</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2020-09-14T17:52:23Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683498#M207043</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/31461"&gt;@mkeintz&lt;/a&gt;:&lt;/P&gt;
&lt;P&gt;What a beautiful piece of hash code, Mark. Thanks 1E+6!&lt;/P&gt;
&lt;P&gt;Kind regards&lt;/P&gt;
&lt;P&gt;Paul D.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 07:33:58 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683498#M207043</guid>
      <dc:creator>hashman</dc:creator>
      <dc:date>2020-09-13T07:33:58Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683505#M207046</link>
      <description>.  If there were a duplicate date, Just sum value as&lt;BR /&gt;&lt;BR /&gt;data sample;&lt;BR /&gt; set sample;&lt;BR /&gt; date=int(date);&lt;BR /&gt;run;&lt;BR /&gt;proc sql;&lt;BR /&gt;create table temp as&lt;BR /&gt;select customerid,date,sum(value) as value&lt;BR /&gt; from sample&lt;BR /&gt;  group by  customerid,date;&lt;BR /&gt;quit;&lt;BR /&gt;&lt;BR /&gt;data sample;&lt;BR /&gt; set temp;&lt;BR /&gt;run;</description>
      <pubDate>Sun, 13 Sep 2020 10:21:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683505#M207046</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2020-09-13T10:21:23Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683534#M207057</link>
      <description>&lt;P&gt;Thank you for responding.&amp;nbsp; I queried the transactions table and found, for the month of August, there are over 2 million distinct accounts with over 60 million transactions.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If that is too much, I would be more than happy doing a subset, a weeks worth, or only certain types of accounts. I work in financial services and we have a fraud mitigation tool that uses these moving averages (24 hours, 3 days, 7 days, 30 days) but the data is 'live' and constantly updates as customers transact, so it isn't available to us to use for ex post insights into how well rules are performing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;That being said, I would like to replicate this functionality in SAS so I can run simulations of rules over past data to see the rule performance based on different adjustments or scenarios.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 20:54:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683534#M207057</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-13T20:54:12Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683535#M207058</link>
      <description>&lt;P&gt;Thank you for taking the time to help my understanding of the hash object. I will attempt to apply this solution to my data and check back in.&lt;/P&gt;</description>
      <pubDate>Sun, 13 Sep 2020 21:03:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683535#M207058</guid>
      <dc:creator>Array_Mover</dc:creator>
      <dc:date>2020-09-13T21:03:03Z</dc:date>
    </item>
    <item>
      <title>Re: Find a 24 hour moving average of transactions by account - Attempting with Hash Tables</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683570#M207071</link>
      <description>&lt;P&gt;Here I start with 2 million accounts and 200 million observations.&lt;/P&gt;
&lt;P&gt;Splitting the table into 20 takes 30 seconds, and each of the 20 SORT+SQL steps takes 1.2 minutes, so the whole thing takes about 25 minutes on my machine (140 SAS sessions open atm).&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data W.HAVE;
  VALUE=1;
  do DATE='01jan2020:0:0'dt to '31aug2020:23:59:59'dt by '06:00:00't;
    do ACCT =1 to 2e6;
      if ranuni(1) &amp;gt; .9 then output;
    end;
  end;
run;
       
data _null_;
  call execute('data ');
  do I=1 to 20;
   call execute (cat('W.TMP',I));
  end; 
  do I=1 to 20;
    call execute(ifc(I=1,';set W.HAVE;','else '));
    call execute (cat('if ',I-1,'e5 &amp;lt;= ACCT &amp;lt; ',I,'e5 then output W.TMP',I,';')); 
  end; 
run;    
 
data _null_;
  do I=1 to 20;
   call execute (cat('proc sort data=W.TMP',I,'; by ACCT DATE; run;                '));
   call execute ('proc sql; create table W.OUT as select a.*, mean(b.VALUE) as AVG ');
   call execute (cat('from W.TMP', I, ' a left join W.TMP', I, ' b                 '));
   call execute ('on a.ACCT=b.ACCT and b.DATE between a.DATE and a.DATE-3600*24    ');
   call execute ('group by a.ACCT, a.DATE, a.VALUE; quit;                          ');
   call execute ('proc append base=W.WANT data=W.OUT; run;                         ');
 end;
run;  &lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Note that library W is an SPDE library with parameters &lt;FONT face="courier new,courier"&gt;compress=binary partsize=1T&lt;/FONT&gt;, in order to reduce IO. There is never a good reason not to use SPDE to process large tables.&lt;/P&gt;</description>
      <pubDate>Mon, 14 Sep 2020 06:47:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Find-a-24-hour-moving-average-of-transactions-by-account/m-p/683570#M207071</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-09-14T06:47:06Z</dc:date>
    </item>
  </channel>
</rss>

