BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Quantopic
Obsidian | Level 7

Hi all,

 

I need to backtest a trading strategy and I want to compute the profit & loss and the portfolio value given the signal and the amount invested in.

 

I ran the following code to do that:

 

DATA BACKTESTING;
	SET BACKTESTING;	
		RETAIN ACCOUNT 500. STAKE 0. PL 0.;
	FORMAT ACCOUNT 10.2 STAKE PL 10.4;
		/* Bet on the Home Team */
		IF FLAG_RESULT EQ "OK" THEN 
                    STAKE = LAG(ACCOUNT) * KELLY AND 
                    PL = STAKE * (BOOKIE_H - 1) AND 
                    ACCOUNT = LAG(ACCOUNT) + PL;
		ELSE IF FLAG_RESULT NE "KO" THEN 
                    STAKE = LAG(ACCOUNT) * KELLY AND 
                    PL = (-1) * STAKE AND 
                    ACCOUNT = LAG(ACCOUNT) + PL;
RUN;

where ACCOUNT is the portfolio value and KELLY is the portfolio percentage invested in the position.

 

The result given this piece of script is not ok, since it does not give the correct values I computed using excel.

 

Can someone help me by suggesting a solution?

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Example input data and the desired result go a long way towards letting us figure out the issues.

 

Provide your existing dataset, or a sample, or something with the same variable names and types that has the same behavior if your data is sensitive, as a data step. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Also provide an example of the desired outpu for that example data.

 

The way you are using LAG is very likely why things only work for the first record. LAG inside a condion, like

ELSE IF _N_ GT 1 THEN DO;
	   BR = LAG(ACCOUNT);

is much more complex then you may think.

 

Please look at the results of this code:

data start;
   input x y;
datalines;
1 2
3 4
5 6
;
run;

data new;
   set start;
   if _n_=1 then br=500;
   else if _n_ gt 1 then do;
      br = lag(x);
      put _n_= br=;
   end;
run;

Lag maintains a separate queue for each instance of the function. So when it is inside a condition as you used the first time lag is encountered then there is no previous value. So the values you are seeing for your BR are not in the sequence you want. And you are complicating things by using an apparently calculated variable, account, that does not exist in your base data BACKTESTING_H (at least the first time you ran the code).

 

All bets may be off after the first time you run something like this:

DATA BACKTESTING_H;
	SET BACKTESTING_H;

as you have changed your previous data, possibly so much that running other code later to address the original logic may not work at all.

 

View solution in original post

4 REPLIES 4
art297
Opal | Level 21

Can you provide example have and want datasets? That would be helpful!

 

It won't work, as is, because (1) you are doing conditional lags and (2) you are trying to lag computed variables. Making better use of the retain statement would work a lot better.

 

Art, CEO, AnalystFinder.com

 

Reeza
Super User

@Quantopic wrote:

Hi all,

 

 

 

The result given this piece of script is not ok, since it does not give the correct values I computed using excel.

 

 


That tells us a very limited amount. So what you have is not working. How's it not working? What are the correct values and what are you getting? Is there a better way, ie RETAIN, than your current implementation? Is Excel or SAS wrong - it's usually a logic fail on either side, but can be Excel as easily as SAS? Besides the LAG issue, which may be all you need, there's really nothing else we can add, if all we know is that it's not 'ok'. 

Quantopic
Obsidian | Level 7

Thanks for the replies.

 

I hope to explain better and clearer the problem.

 

I want to backtest a strategy on the basis of a signal, represented by the variable FLAG_RESULT, that assumes the value 'OK' when the strategy is profitable, 'KO' otherwise.

 

In order to do that, I have to compute the portfolio value at each time period, where the portfolio value is reppresented by the variable ACCOUNT; such value has to be equal to 500 in first row and equal to the lagged portfolio value, at the time t-1, plus the amount of money the strategy earned/lost at the time t, represented by the variable PL.

 

Moreover:

  • BR is a variable set equal to 500, useful to compute the remaining ones
  • STAKE is the percentage of the portfolio value in which the strategy suggest to invest in

 

The script works since there is any error in the log window, but all the variables have missing values from the second row, probably, as @art297 suggested, I'm trying to lag computed variables.

 

Anyway, I did not find any solution yet.

 

 

By following what @art297 suggested:

 

 

Spoiler
It won't work, as is, because (1) you are doing conditional lags and (2) you are trying to lag computed variables.

 

 

I tried to do that without using the retain statement and running the following script:

 

DATA BACKTESTING_H;
	SET BACKTESTING_H;
	FORMAT BR ACCOUNT 10.2 STAKE PL 10.2;
	IF _N_ EQ 1 THEN DO;
		BR = 500;
		STAKE = KELLY * BR;
		PL = STAKE * (IMPLIED_PROBABILITY - 1);
		ACCOUNT = BR + PL;
	END;
	ELSE IF _N_ GT 1 THEN DO;
	   BR = LAG(ACCOUNT);
	   STAKE = BR * KELLY;
		IF  FLAG_RESULT EQ "OK" THEN DO;
		   PL = STAKE * (IMPLIED_PROBABILITY - 1); 
		   ACCOUNT = BR + PL; 
		END;
		ELSE IF FLAG_RESULT NE "KO" THEN DO;
		   PL = (-1) * STAKE;
		   ACCOUNT = BR + PL;
		END;
	END;
RUN;

The problem I found running this script is that it works only for the first row, while for _N_ greater than the 1 it gives missing values as output.

 

Sorry for being little clear to ask the question and thanks for the help.

 

 

 

 

 

ballardw
Super User

Example input data and the desired result go a long way towards letting us figure out the issues.

 

Provide your existing dataset, or a sample, or something with the same variable names and types that has the same behavior if your data is sensitive, as a data step. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

Also provide an example of the desired outpu for that example data.

 

The way you are using LAG is very likely why things only work for the first record. LAG inside a condion, like

ELSE IF _N_ GT 1 THEN DO;
	   BR = LAG(ACCOUNT);

is much more complex then you may think.

 

Please look at the results of this code:

data start;
   input x y;
datalines;
1 2
3 4
5 6
;
run;

data new;
   set start;
   if _n_=1 then br=500;
   else if _n_ gt 1 then do;
      br = lag(x);
      put _n_= br=;
   end;
run;

Lag maintains a separate queue for each instance of the function. So when it is inside a condition as you used the first time lag is encountered then there is no previous value. So the values you are seeing for your BR are not in the sequence you want. And you are complicating things by using an apparently calculated variable, account, that does not exist in your base data BACKTESTING_H (at least the first time you ran the code).

 

All bets may be off after the first time you run something like this:

DATA BACKTESTING_H;
	SET BACKTESTING_H;

as you have changed your previous data, possibly so much that running other code later to address the original logic may not work at all.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1863 views
  • 1 like
  • 4 in conversation