About mgao6767

mgao6767 · ‎09-12-2016

Yes sure. I found this discussion days ago when I was dealing with exactly the same problem. Don't know whether I should start a new one because I'm only improving on execution time.

mgao6767 · ‎09-12-2016

Hi I've encountered the same problem recently and I've found several ways in doing that. They are all effective but the most efficient one I found is as follow. It generates correct results for billions of transactions within minutes. The PROC SQL is effective but it takes enormous time to execute should you have a huge dataset. So I figure out a way to do it only using DATA step. The problem is to find the nearest midpoint after 5min for every transaction. Suppose you have the following dataset: Data transactions; input PRICE MIDPT TIME : anydttme.; format time time.; cards; 18.2 18.43 10:00:53 18.41 18.405 10:07:05 18.38 18.375 10:10:14 18.42 18.395 10:13:53 ; run; The code below will generate a dataset named FinalMatchedResult where the nearest MIDPT after 5min for each transaction is added as a new variable MIDPT_5. /* The transactions are sorted according to TIME. Add an ID to each transaction. */ DATA WORK.transactions; set WORK.transactions; ID + 1; RUN; /* Create a new dataset where each transaction is delayed by 5min (300s). Keep record of the ID in the original dataset. */ DATA WORK.transactions2; set WORK.transactions; TIME = TIME + 300; rename ID = OriginalID; RUN; /* Stack the two datasets together and sort them by TIME, ID and OriginalID. After these two steps, you will have a dataset where the nearst MIDPT after 5min for each transaction is simply the first obs that has non-missing OriginalID. This finding is especially important, and you should notice such result in the output WORK.transaction3 sorted. */ DATA WORK.transactions3; set WORK.transactions WORK.transactions2; RUN; PROC sort data= WORK.transactions3; by TIME ID OriginalID; RUN; /* Based on previous result, delete all irrelavent transactions. */ DATA WORK.transactions3_matched (drop=lastObs); set transactions3; lastObs = lag(OriginalID); if OriginalID =. and lastObs =. then delete; RUN; /* One MIDPT maybe matched to several past transactions 5min ago. Luckily these transactions' positions in the dataset are known, start to _N_ -1, where _N_ is the row number of current MIDPT. The matched results are saved in WORK.transactions3_matched. */ DATA WORK.matched_result; set WORK.transactions3_matched; retain start 1; if OriginalID =. then do; MIDPT_5 = MIDPT; do i = start to _N_ - 1; set transactions3_matched point= i; output; end; start = _N_ + 1; end; RUN; /* WORK.FinalMatchedResult saves all nearest MIDPT 5min later for every transaction. You could double check the dataset to make sure you have the result right. */ DATA WORK.FinalMatchedResult(drop=ID OriginalID start); set WORK.matched_result; TIME = TIME - 300; RUN; The output dataset looks like this: PRICE MIDPT TIME MIDPT_5 18.2 18.43 10:00:53 18.405 18.41 18.405 10:07:05 18.395 Hope it helps! Adrian

Online Status	Offline
Date Last Visited	‎09-15-2016 11:13 PM

Re: How to get a value that is 5 minutes in the future?

Re: How to get a value that is 5 minutes in the future?

Re: How to get a value that is 5 minutes in the future?

Re: How to get a value that is 5 minutes in the future?