I will try and describe my problem in pseudo-code. I am a self-taught programmer, so I apologize in advance for any gaps in my vocabulary and knowledge. I have transaction data across accounts down to the second. The problem with every moving average answer I have looked up is, the data is always very structured and designed in a way to make the calculation relatively easy. The transaction data is difficult, because it can vary wildly depending on the account. An account might have 10 transactions in a 24 hour period, or 5 transactions over 10 days. I need to be able to check and see if the previous transactions were within 24 hours, and if not, do not count them. I have created a fake dataset with dates that are commensurate with the problems I am facing. data sample;
input customerid : 7.
transactionid: $13.
date : datetime26.
value : 3.
;
format date datetime26.;
datalines;
2133780 2133780_00355 15JUN2020:12:47:17.000000 53
2133780 2133780_00687 15JUN2020:12:48:35.000000 72
2133780 2133780_00181 15JUN2020:12:59:44.000000 88
2133780 2133780_00273 15JUN2020:16:43:46.000000 96
2133780 2133780_00803 16JUN2020:22:30:15.000000 25
2133780 2133780_00411 17JUN2020:07:22:18.000000 75
2133780 2133780_00191 17JUN2020:23:16:39.000000 59
2133780 2133780_00461 14JUL2020:09:35:32.000000 22
2133780 2133780_00532 21JUL2020:10:40:54.000000 46
2133780 2133780_00609 28JUL2020:09:58:30.000000 36
2133780 2133780_00504 28JUL2020:10:12:55.000000 87
2133780 2133780_00488 31JUL2020:15:30:44.000000 36
2689999 2689999_00284 02JUN2020:09:03:55.000000 44
2689999 2689999_00734 06JUN2020:11:20:23.000000 95
2689999 2689999_00061 06JUN2020:13:40:22.000000 29
2689999 2689999_00540 07JUN2020:14:00:19.000000 14
2689999 2689999_00616 08JUN2020:13:36:24.000000 14
2689999 2689999_00002 08JUN2020:13:59:32.000000 82
2689999 2689999_00966 04JUL2020:11:29:49.000000 21
2689999 2689999_00859 10JUL2020:14:46:37.000000 34
2689999 2689999_00816 29JUL2020:13:56:45.000000 95
2689999 2689999_00045 29JUL2020:15:12:19.000000 42
2689999 2689999_00481 02AUG2020:11:02:21.000000 38
2689999 2689999_00227 02AUG2020:11:03:41.000000 42
2689999 2689999_00354 02AUG2020:12:26:34.000000 42
2689999 2689999_00794 03AUG2020:09:10:43.000000 73
2689999 2689999_00857 20AUG2020:15:03:19.000000 16
;
run; So, for each account, I need to: 1)Loop through every transaction 2)On each transaction, review all previous transactions and compare the transaction date to the previous date and calculate how long ago it occurred 3)Count those transactions that qualify 4)Sum those transaction amounts that qualify 5)Divide to find average 6)Repeat 7)Reset when you get to a new account I believe the Hash Object will be the best route, because there could be tens of millions of rows. I have begun reading 'Data Management Solutions Using SAS Hash Table Operations' but the concept is so foreign to me, I am having trouble grasping it. Here is what I have tried: data _null_;
dcl hash transactions(dataset:"sample",multidata:"Y",ordered:"A");
transactions.defineKey("customerid");
transactions.defineData("Date","value");
transactions.defineDone();
dcl hash accounts(dataset:"sample(rename=(date=base_date))",multidata:"Y",ordered:"A");
accounts.defineKey("customerid");
accounts.defineData("customerid","base_date");
accounts.defineDone();
/* define the result hash object tables */
dcl hash h_pointer;
dcl hash byAccount(ordered:"A");
byAccount.defineKey("customerid");
byAccount.defineData("customerid","Date","value","_average");
byAccount.defineDone();
if 0 then set sample;
format base_date datetime26.;
lr = 0;
do until(lr);
set sample end = lr;
call missing(customerid,date,base_date,value);
accounts.find();
transactions_rc = transactions.find();
do while(transactions_rc=0);
time_diff = intck('second',base_date,date);
put time_diff;
if (-86400 le intck('second',base_date,date) le 86400) then leave;
transactions_rc = transactions.find_next();
end;
h_pointer = byAccount;
link calculator;
end;
stop;
calculator:
call missing(date,value,_average);
rc = h_pointer.find();
counts +1;
h_pointer.replace();
return;
run; Right now, it is just trying to count. Any help would be appreciated! @hashman@DonH
... View more