DATA Step, Macro, Functions and more

Working out running totals

Super Contributor
Posts: 261

Working out running totals




I have the following dataset and I want to create a formula that calculates the max profit from a certain amount of capital. The dataset below shows horserace data over a few hours. The field scrapetime is the time the latest data was taken about the race. The outcome is the horse being backed, the return percentage is the expected return on capital, the Time_to_event to the numbers of hours to the event starts, the max stake is the Max_Stake is the maximum amount that can be placed on the horse.   


I have two variables: timeto and capital which I have set as follows:


%let timeto = 2; /* max of two hours to event*/

%let capital = 1000; /* max capital of €1000 */


I want the algorithm to do the following


sort by scrapetime /* which is fine

apply timeto restriction i.e. only apply algorithm to rows where the timeto is less than 2 hours /* which is fine


So in the first event below the timeto restriction is met so the max_stake of 52 is placed and the capital_remaining = 1000-52 = 948


Here is the difficult part, i want the algorithm to work through each line but retain information from the previous lines.


So, for example, the second event is after the first event has taken place and the first bet lost so my remaining capital is still 948 (if it has one it would be 948+416= 1364)


So for each row it needs to determine which of the previous races have occurred between this event and the last and adjust the capital accordingly. I'm not sure how to do this. But the description of what I want at each row is


1. Have any previous events finished since the last row

2. If so what is the capital asjustment (this is just +profit if win and no adjustment if lose)


and to keep doing this until it gets to the last row




Eventscrapetimeevent_timeOutcomeReturn_PercTime_To_EventMax_StakeOutcomePotential Profit
Listowel15SEP16:16:252016/9/15 - 16:40Jack The Wire0.020.352Lose416
Ayr15SEP16:16:512016/9/15 - 17:20Henley0.010.518Win225
Crayford15SEP16:16:522016/9/15 - 16:57Lordsbury Dolly0.020.115Lose44
Listowel15SEP16:16:572016/9/15 - 17:45Dos Picas0.010.813Lose26
Ayr15SEP16:16:572016/9/15 - 17:20Catwilldo0.040.4629Lose4180
Pontefract15SEP16:17:042016/9/15 - 17:10Queen Of The Stars0.010.11188Win567
Pontefract15SEP16:17:342016/9/15 - 17:40Irish Optimism0.010.1562Win821
Chelmsford City15SEP16:17:342016/9/15 - 17:50Prince Of Cool0.010.328Lose112
Chelmsford City15SEP16:17:382016/9/15 - 17:50Prince Of Cool0.010.2485Lose1892
Chelmsford City15SEP16:17:472016/9/15 - 18:20Free Bounty0.020.6422Lose836
Chelmsford City15SEP16:17:492016/9/15 - 18:20Free Bounty0.020.5432Lose854
Chelmsford City15SEP16:17:512016/9/15 - 18:20Free Bounty0.020.5334Lose661
Chelmsford City15SEP16:17:532016/9/15 - 19:50Paper Faces0.022806Lose2338
Pontefract 15SEP16:17:592016/9/15 - 18:10Bay Mirage0.010.2129Win1071
Pontefract 15SEP16:18:002016/9/15 - 18:10Mon Brav0.020.2119Lose1258
Belmont Park15SEP16:18:252016/9/15 - 18:30Majestic Jessica0.040.137Win95
Hove15SEP16:18:462016/9/15 - 19:01Kilmurry Luke0.020.332Lose57
Belmont Park15SEP16:18:482016/9/15 - 19:01Dunk A Din0.060.226Win192
Belmont Park15SEP16:18:502016/9/15 - 19:01Dunk A Din0.060.218Lose133
Newcastle15SEP16:19:062016/9/15 - 19:11Bramble Diamond0.030.174Win137
Delaware Park15SEP16:19:292016/9/15 - 19:45Tizthefastlaine0.010.390Lose180
Delaware Park15SEP16:19:312016/9/15 - 19:45Tizthefastlaine0.020.2104Win208
Delaware Park15SEP16:19:332016/9/15 - 19:45Tizthefastlaine0.020.2107Lose212
Henlow15SEP16:19:412016/9/15 - 19:47Act Of War0.020.113Win46
Super User
Super User
Posts: 9,599

Re: Working out running totals



Its a good idea to post test data in the form of a datastep, you can find code to do this here:


Your question seems to reside on carrying data over to subsequent rows, you can do this with retain <vars>; statement.  Set them all to missing or zero in the first instance of the group, then check your if <> logic at each observation and update the retained variables as necessary, thus you have the data to impute other variables.

Super Contributor
Posts: 261

Re: Working out running totals

[ Edited ]



I've been working on this for a while now (using Lag function) but i'm not getting anywhere.


Bascially, the dataset is sorted by scrape time. They are all bets and assume that after the event the stake is returned to my account. So my capital to start is 10000. In the first row my capital is 10000-stake. In the second row, I need to look back to the first row. Using lag If the first event is over then my stake is returned. 


Here is my code:


infile datalines dsd truncover;
input stake:32. scrapetime: DATETIME14. eventtime: DATETIME14.;


data _NULL_;
if 0 then set dataset nobs=n;
call symputx('nrows',n);


%let initialcapital=10000;


%macro test;

data test;
set dataset;

if _n_ = 1 then do; running_capital = &initialcapital - stake; end;

else do;


%do look_back=1 %to &nrows;

if lag&look_back.(eventtime) < scrapetime then running_capital = running_capital+lag&look_back.(stake);


running_capital = running_capital - stake;



Super User
Super User
Posts: 9,599

Re: Working out running totals


Sorry, I don;t have time to look at this now, off to 3 day meeting.  I would however suggest you re-assess your logic there.  Why have a datastep to get number of observations, then a macro to loop over this.  Look up the syntax of retain variables.  The logic is pretty straight forward (there are probably many examples on here already):

Sort data in the correct order

At each first group blank retained variables

Check logic items

Update retained values with current values


Should be one datastep.

Ask a Question
Discussion stats
  • 3 replies
  • 2 in conversation