BookmarkSubscribeRSS Feed
bluenotebooks
Calcite | Level 5

Hi all,

 

I have a large amount of transaction data and I'm wanting to pull transactions where a count of previous transactions have taken place prior to a given transaction in a specified amount of time.  For example, I might want to see all transactions where there were two transactions that took place at the same merchant in the previous hour on the same card.  

 

I have account numbers and the datetime of the authorizations to work with along with merchant name, merchant category, and a bunch of variables to specify the activity I'm looking for.  I have other systems I can use to pull such activity but it's slow and clunky.  I've been playing around with different code but can't seem to come up with something that works.  

 

Any suggestions on how I can accomplish this?  No need to provide code examples if it's easier to just explain in words how I can do this.

 

If further explanation of what I'm trying to accomplish is needed just let me know and I'll be happy to provide.

 

Thanks!

9 REPLIES 9
LinusH
Tourmaline | Level 20

Sounds like you have two issues:

  1. How to code the logic. Please provide what you tried so far. Extract a reasonable amount of transactions to you local SAS environment, so speed up program development. Or create your own sample data.
  2. The speed to interact with the source system is too slow. This is probably more of an architectural issue. If you have the resources, you should extract all data in bulk regularly, in a data warehouse like initiative, then you should be able to apply your logic on high speed SAS data sets.
Data never sleeps
bluenotebooks
Calcite | Level 5

Coding the logic is the issue.

 

I've tried using the lag function and intck.  I've thought about using SQL and joining the table back on itself and doing intck but this doesn't seem efficient at all considering I'm working with tens of thousands of rows of data.  

LinusH
Tourmaline | Level 20
Lag or Retain should do the trick. What didn't work?
Join does not sound like an ideal solution.
Data never sleeps
bluenotebooks
Calcite | Level 5

I can get the time between transactions but run into trouble when wanting to actually pull the transactions where there were a set number of previous transactions in a given time.

ballardw
Super User

When you say "a given time" to you mean an explicit time such as 1:00:00 to 1:59:59 or an interval such as 60 minutes?

The later is going to have issues because you could well have overlapping intervals.

 

See this example code:

data example;
   informat acct $5. transdt datetime.;
   format transdt datetime.;
   input acct transdt;
datalines;
12345 01JAN2016:10:05:00
12345 01JAN2016:10:35:00
12345 01JAN2016:11:00:00
12345 01JAN2016:11:25:00
12345 01JAN2016:11:55:00
;
RUN;

data junk;
   set example;
   int1 = intck('minute',lag1(transdt),transdt);
   int2 = intck('minute',lag2(transdt),transdt);
   int3 = intck('minute',lag3(transdt),transdt);
   int4 = intck('minute',lag4(transdt),transdt);
run;

Look at the intervals (the int variables). The 4th record is within an hour of the 2nd record and the 3rd record is within an hour of the 1st. The 4th is within an hour of 3rd but not 1st. So which interval does the 3rd record fit? And the longer your interval the more likely you have some overlapping intervals. If you want a single 'interval' per record you will have to pick some rules. Or calculate a bunch of variables (array anyone) of Interval_to_previous_transaction and which_previous_transaction and possibly what ever you are doing with the values of the transactions).

 

 

Or possibly this might be an IML or ETS but not sure what would fit.

bluenotebooks
Calcite | Level 5

I'm referring to an interval such as 60 minutes.  Not an explicit time.  

 

Thanks,

 

Josh

ballardw
Super User

And your rule(s) for handling overlapping intervals are what?

SASKiwi
PROC Star

I've been involved with similar exercises like this in the past and I suspect you may be wanting to analyse credit card transactions for suspicious or fraudulent behaviour.

 

The best approach with this might be to just preprocess the transactions and total up the number of transactions by merchant and card number for each quarter hour slot (24 times 4 per day). Then go through the data again (with the quarter hour data joined on to it), take the quarter hour slot the transaction belongs to plus the three preceding ones to work out how many transactions there have been in the last hour. 

 

bluenotebooks
Calcite | Level 5

Having trouble wrapping my head around this idea.  

 

We have a couple people in other areas here I'm going to reach out to see if they can walk me through what I'm trying to do.

 

Thanks for the suggestions.

 

My SAS skills are pretty limited at this point.  I've mostly relied on SQL in the past.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2111 views
  • 0 likes
  • 4 in conversation