06-08-2016 09:54 AM
I have a large amount of transaction data and I'm wanting to pull transactions where a count of previous transactions have taken place prior to a given transaction in a specified amount of time. For example, I might want to see all transactions where there were two transactions that took place at the same merchant in the previous hour on the same card.
I have account numbers and the datetime of the authorizations to work with along with merchant name, merchant category, and a bunch of variables to specify the activity I'm looking for. I have other systems I can use to pull such activity but it's slow and clunky. I've been playing around with different code but can't seem to come up with something that works.
Any suggestions on how I can accomplish this? No need to provide code examples if it's easier to just explain in words how I can do this.
If further explanation of what I'm trying to accomplish is needed just let me know and I'll be happy to provide.
06-08-2016 10:28 AM
Sounds like you have two issues:
06-08-2016 10:50 AM
Coding the logic is the issue.
I've tried using the lag function and intck. I've thought about using SQL and joining the table back on itself and doing intck but this doesn't seem efficient at all considering I'm working with tens of thousands of rows of data.
06-08-2016 11:55 AM
I can get the time between transactions but run into trouble when wanting to actually pull the transactions where there were a set number of previous transactions in a given time.
06-08-2016 02:04 PM
When you say "a given time" to you mean an explicit time such as 1:00:00 to 1:59:59 or an interval such as 60 minutes?
The later is going to have issues because you could well have overlapping intervals.
See this example code:
data example; informat acct $5. transdt datetime.; format transdt datetime.; input acct transdt; datalines; 12345 01JAN2016:10:05:00 12345 01JAN2016:10:35:00 12345 01JAN2016:11:00:00 12345 01JAN2016:11:25:00 12345 01JAN2016:11:55:00 ; RUN; data junk; set example; int1 = intck('minute',lag1(transdt),transdt); int2 = intck('minute',lag2(transdt),transdt); int3 = intck('minute',lag3(transdt),transdt); int4 = intck('minute',lag4(transdt),transdt); run;
Look at the intervals (the int variables). The 4th record is within an hour of the 2nd record and the 3rd record is within an hour of the 1st. The 4th is within an hour of 3rd but not 1st. So which interval does the 3rd record fit? And the longer your interval the more likely you have some overlapping intervals. If you want a single 'interval' per record you will have to pick some rules. Or calculate a bunch of variables (array anyone) of Interval_to_previous_transaction and which_previous_transaction and possibly what ever you are doing with the values of the transactions).
Or possibly this might be an IML or ETS but not sure what would fit.
06-08-2016 04:33 PM
I've been involved with similar exercises like this in the past and I suspect you may be wanting to analyse credit card transactions for suspicious or fraudulent behaviour.
The best approach with this might be to just preprocess the transactions and total up the number of transactions by merchant and card number for each quarter hour slot (24 times 4 per day). Then go through the data again (with the quarter hour data joined on to it), take the quarter hour slot the transaction belongs to plus the three preceding ones to work out how many transactions there have been in the last hour.
06-09-2016 08:51 AM
Having trouble wrapping my head around this idea.
We have a couple people in other areas here I'm going to reach out to see if they can walk me through what I'm trying to do.
Thanks for the suggestions.
My SAS skills are pretty limited at this point. I've mostly relied on SQL in the past.