Hi all,
I have a large amount of transaction data and I'm wanting to pull transactions where a count of previous transactions have taken place prior to a given transaction in a specified amount of time. For example, I might want to see all transactions where there were two transactions that took place at the same merchant in the previous hour on the same card.
I have account numbers and the datetime of the authorizations to work with along with merchant name, merchant category, and a bunch of variables to specify the activity I'm looking for. I have other systems I can use to pull such activity but it's slow and clunky. I've been playing around with different code but can't seem to come up with something that works.
Any suggestions on how I can accomplish this? No need to provide code examples if it's easier to just explain in words how I can do this.
If further explanation of what I'm trying to accomplish is needed just let me know and I'll be happy to provide.
Thanks!
Sounds like you have two issues:
Coding the logic is the issue.
I've tried using the lag function and intck. I've thought about using SQL and joining the table back on itself and doing intck but this doesn't seem efficient at all considering I'm working with tens of thousands of rows of data.
I can get the time between transactions but run into trouble when wanting to actually pull the transactions where there were a set number of previous transactions in a given time.
When you say "a given time" to you mean an explicit time such as 1:00:00 to 1:59:59 or an interval such as 60 minutes?
The later is going to have issues because you could well have overlapping intervals.
See this example code:
data example;
informat acct $5. transdt datetime.;
format transdt datetime.;
input acct transdt;
datalines;
12345 01JAN2016:10:05:00
12345 01JAN2016:10:35:00
12345 01JAN2016:11:00:00
12345 01JAN2016:11:25:00
12345 01JAN2016:11:55:00
;
RUN;
data junk;
set example;
int1 = intck('minute',lag1(transdt),transdt);
int2 = intck('minute',lag2(transdt),transdt);
int3 = intck('minute',lag3(transdt),transdt);
int4 = intck('minute',lag4(transdt),transdt);
run;
Look at the intervals (the int variables). The 4th record is within an hour of the 2nd record and the 3rd record is within an hour of the 1st. The 4th is within an hour of 3rd but not 1st. So which interval does the 3rd record fit? And the longer your interval the more likely you have some overlapping intervals. If you want a single 'interval' per record you will have to pick some rules. Or calculate a bunch of variables (array anyone) of Interval_to_previous_transaction and which_previous_transaction and possibly what ever you are doing with the values of the transactions).
Or possibly this might be an IML or ETS but not sure what would fit.
I'm referring to an interval such as 60 minutes. Not an explicit time.
Thanks,
Josh
And your rule(s) for handling overlapping intervals are what?
I've been involved with similar exercises like this in the past and I suspect you may be wanting to analyse credit card transactions for suspicious or fraudulent behaviour.
The best approach with this might be to just preprocess the transactions and total up the number of transactions by merchant and card number for each quarter hour slot (24 times 4 per day). Then go through the data again (with the quarter hour data joined on to it), take the quarter hour slot the transaction belongs to plus the three preceding ones to work out how many transactions there have been in the last hour.
Having trouble wrapping my head around this idea.
We have a couple people in other areas here I'm going to reach out to see if they can walk me through what I'm trying to do.
Thanks for the suggestions.
My SAS skills are pretty limited at this point. I've mostly relied on SQL in the past.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.