Hi Everyone,
This is my first post so please point out any flaws in my approach to asking this question.
I have a dataset that includes foreign and domestic stock returns by day. I am trying to interpolate returns based on the foreign returns trading days in for two scenarios so I can run a smooth, consistent event study.
Let A = Foreign Returns and B = Domestic
1) First scenario. A has return data while B does not. For example, lets just say that A is trading on the July 3,4 and 5 of 2008 but B is only trading on July 3 and 5 of 2008. The returns for A on July 3, 4, and 4 are -0.013854, -0.059828, and 0.051853, respectfully. The Returns for B on July 3 and 5 are 0.022954964 and 0.0017698109 , respectfully, while there is a null value for Return B on July 4. What i want is to discount the value of .0017698109 back one day. So my formula would look something like this: (1+.0017698109)^(-1/2). This then gives me a value of 0.0008845142. I then want to take this value and place it as the return value for B on July 4 and July 5. So, replacing a null value with 0.0008845142 and the value 0.0017698109 with 0.0008845142.
2) Second scenario. A has no return data while B does. For example, lets just say that A is trading on the March 20 and 22 of 2008 and B is only trading on March 20, 21, and 22. The returns for A on March 20 and 22 are -0.028234 and 0.052298,respectfully, with a null value on March 21. The Returns for B on March 20, 21, and 22 are 0.0638884, -0.00155123567, -0.0281506334, respectfully. In this case, I am taking (1+-0.00155123567) * (1+ -0.0281506334) - 1 to yield a result of -0.02965820. This result will then replace the value of Return B on March 22 and the value on March 21 is then turned into a null value. So, -0.0281506334 is replaced with -0.02965820 and -0.00155123567 is replaced with a null value.
The trickiest part is, I want this program to run if there is a difference of more than just one day. So if there was two days in a row, then the formula for scenario 1 would be (1+final_day_returns)^(-1/(days that are null +1). You would accumulate scenario two as the multiple of 3 days then instead of two, etc..
This is very easy to do in excel, and I will attach the excel file with the tabs of how I did it. But i would like to automate the process with SAS. Any ideas or code that you could reference or give me would be very much appreciated.
To be frank, I am quite new to SAS and have usually worked in excel. I am guessing the best place to start with this would be to label a macro array of your forein returns and one of your domestic returns (if you have more than one series for each) and then work with "do if" statements but I could very much use some direction.
Thanks in advance!
This simple code seems to give the expected result
data have;
infile "&sasforum.\datasets\Returns.csv" firstobs=2 dsd truncover;
input DATE :mmddyy10. A B B_changed;
format date yymmdd10.;
run;
data spread;
do nb = 1 by 1 until(not missing(B));
set have;
end;
br = B;
do i = 1 to nb;
set have;
if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
else B_spread = B;
output;
end;
drop nb i br;
run;
data bunch;
br = 1;
do nb = 1 by 1 until(not missing(A));
set spread;
br = br * (1+B_spread);
end;
do i = 1 to nb;
set have;
if i < nb then call missing(B_bunch);
else B_bunch = br - 1;
output;
end;
drop nb i br B_spread;
run;
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.
You description is not interpolation but more of a "distribution". Interpolation would be if the vale on day 1 is 5 and on day 4 is 12 then the value for day 2 would be 7.3333 and day 3 of 9.666 where the change between to values is distributed across the intervals added to the first.
Are your values consecutive days (date values)? If not does the rule change depending on the type of gap?
Example if your dates are
01/08/2018 .123
01/09/2018 .
01/12/2018 .075
would the result be same for the missing value if the data were as follows?
01/08/2018 .123
01/11/2018 .
01/12/2018 .075
Thanks for the advice. I am having a bit of trouble using your referenced link since I am using SAS EG. I believe it should work I just don't understand what is going wrong. I will look into that further. As of right now, i am attaching a tab deliminated txt file and a csv. Both contain the same data and have four columns. Date, A=Foreign returns, B= Domestic, and B_changed = What B returns are supposed to change into.
As for your question, yes the result will be the same.
Best,
jrdykstr93
This simple code seems to give the expected result
data have;
infile "&sasforum.\datasets\Returns.csv" firstobs=2 dsd truncover;
input DATE :mmddyy10. A B B_changed;
format date yymmdd10.;
run;
data spread;
do nb = 1 by 1 until(not missing(B));
set have;
end;
br = B;
do i = 1 to nb;
set have;
if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
else B_spread = B;
output;
end;
drop nb i br;
run;
data bunch;
br = 1;
do nb = 1 by 1 until(not missing(A));
set spread;
br = br * (1+B_spread);
end;
do i = 1 to nb;
set have;
if i < nb then call missing(B_bunch);
else B_bunch = br - 1;
output;
end;
drop nb i br B_spread;
run;
Thanks a lot. That works perfect.
I understand the code pretty well. But if you could take one second and explain to me how this part of the code below:
do nb = 1 by 1 until(not missing(A));
set spread;
br = br * (1+B_spread);
end;
Is yielding a correct result? I am running this in SAS and trying to follow by looking at data sets but it is not really making sense to me how br = br*(1_B_spread) is getting the result that we want. It seems that it should yield something else looking at the data set. Can you explain?
Thanks,
jdykstr93
In that loop, br starts at 1 and then accumulates the product of the B returns corresponding to a sequence of missing A returns and the first non-missing A return. In the following loop, all B returns are set to missing except the last one which is set to the accumulated B returns. Thus the first non-missing B return becomes the accumulated product of it's original value and the values of the removed returns.
hth
I would suggest looking at PROC TIMESERIES to fill in the blanks. There are several methods that may align with what you want.
You need SAS/ETS license to do this.
Here's a little example of how that could work:
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
@jrdykstr93 wrote:
Hi Everyone,
This is my first post so please point out any flaws in my approach to asking this question.
I have a dataset that includes foreign and domestic stock returns by day. I am trying to interpolate returns based on the foreign returns trading days in for two scenarios so I can run a smooth, consistent event study.
Let A = Foreign Returns and B = Domestic
1) First scenario. A has return data while B does not. For example, lets just say that A is trading on the July 3,4 and 5 of 2008 but B is only trading on July 3 and 5 of 2008. The returns for A on July 3, 4, and 4 are -0.013854, -0.059828, and 0.051853, respectfully. The Returns for B on July 3 and 5 are 0.022954964 and 0.0017698109 , respectfully, while there is a null value for Return B on July 4. What i want is to discount the value of .0017698109 back one day. So my formula would look something like this: (1+.0017698109)^(-1/2). This then gives me a value of 0.0008845142. I then want to take this value and place it as the return value for B on July 4 and July 5. So, replacing a null value with 0.0008845142 and the value 0.0017698109 with 0.0008845142.
2) Second scenario. A has no return data while B does. For example, lets just say that A is trading on the March 20 and 22 of 2008 and B is only trading on March 20, 21, and 22. The returns for A on March 20 and 22 are -0.028234 and 0.052298,respectfully, with a null value on March 21. The Returns for B on March 20, 21, and 22 are 0.0638884, -0.00155123567, -0.0281506334, respectfully. In this case, I am taking (1+-0.00155123567) * (1+ -0.0281506334) - 1 to yield a result of -0.02965820. This result will then replace the value of Return B on March 22 and the value on March 21 is then turned into a null value. So, -0.0281506334 is replaced with -0.02965820 and -0.00155123567 is replaced with a null value.
The trickiest part is, I want this program to run if there is a difference of more than just one day. So if there was two days in a row, then the formula for scenario 1 would be (1+final_day_returns)^(-1/(days that are null +1). You would accumulate scenario two as the multiple of 3 days then instead of two, etc..
This is very easy to do in excel, and I will attach the excel file with the tabs of how I did it. But i would like to automate the process with SAS. Any ideas or code that you could reference or give me would be very much appreciated.
To be frank, I am quite new to SAS and have usually worked in excel. I am guessing the best place to start with this would be to label a macro array of your forein returns and one of your domestic returns (if you have more than one series for each) and then work with "do if" statements but I could very much use some direction.
Thanks in advance!
Please post a CSV version of your example data. The colors in the Excel file are useful for understanding, we can look at them on the forum preview. But for downloading, a text file is much safer.
It isn't very difficult to do this with datasteps. But you must provide some downloadable data for us.
Very good point. Here is a csv and a tab delimited txt file for your use.
Please let me know if these are in the correct format or if you desire another. I am having a bit of trouble converting my data set to a data step right now but will look into that further.
Best,
jrdykstr93
Although my first thought on this question was to run PROC EXPAND (in the SAS/ETS product), in the end I don't think it will be suitable. Let's say you have 2 consecutive missing values for A_RETURNS. Yes, PROC EXPAND will replace those missing values, but it will try to smooth those values based on some combination of prior and/or subsequent non-missing values. But that's not really what you want. Instead, in the case of 2 missing values, you want to treat the following non-missing value as if it is a 3-day return, and partial it out into 3 daily returns that would compound to the observed 3-day return. In other words, you not only want to replace the 2 missing returns, but also change the first non-missing return. And, in the case of all other non-missing returns, you want to leave them untouched. I don't see how to make PROC EXPAND do that.
In a data step, the best way is probably.
Do the same for B_returns:
data want1;
set have;
if A_returns^=lag(A_returns) and A_returns=. then do;
do ptrA =_n_ by 1 until (adjusted_retA^=. or _error_=1);
set have (keep=A_returns rename=(A_returns=adjusted_retA)) point=ptrA;
end;
adjusted_retA=exp(log(adjusted_retA+1)/(ptrA+1-_N_))-1;
end;
if ptrA<_N_ then adjusted_retA=A_returns;
if B_returns^=lag(B_returns) and B_returns=. then do;
do ptrB =_n_ by 1 until (adjusted_retB^=. or _error_=1);
set have (keep=B_returns rename=(B_returns=adjusted_retB)) point=ptrB;
end;
adjusted_retB=exp(log(adjusted_retB+1)/(ptrB+1-_N_))-1;
end;
if ptrB<_N_ then adjusted_retB=B_returns;
run;
Note that the SET statements inside the do loops each read in only one variable (A_returns or B_returns), and they are renamed to the new variables: adjusted_retA or adjusted_retB. The until condition has an "or _error_=1" to provide a way to test is the SET statement is pointing beyond the end of dataset HAVE.
This worked very well. It calculated scenario 1 returns perfectly. I think that my initial question was a bit unclear though. A returns are never changed. Just B's. The difference is one is being accumulated and one is being discounted based on a null value in column B or in Column A. But this is doing what I want. I think I will just have to change the code underneath "if A_returns.... then do; to reflect what I want in column B and it should work I think. If I am able to do that, I will mark this answer as the correct solution. May take me a few days though. Thanks for all your help.
Many Thanks,
jrdykstr93
I am still having a bit of trouble. I need to select the return right after the null value as well in the scenario when A_return is .
For example--
Date A_return B_return B_final
5/01/2008 . .4 .
5/02/2008 .04 -.1 .26
B_final return on 5/02/2008 is equal to [(1 + .4) * (1+(-1))] - 1. Your code for this scenario only goes up to the null values of A and will not incorporate the -.1 return of B (given that I changed your code a bit to select B values when A is null. Makes this scenario a bit trickier than the other. I am still going to look into this myself and see if I can figure something out based off your sample. I just wanted to be clear on what is happening with the data.
Best,
jrdykstr93
My revised understanding of your question, is that you want:
Then the technique is to
data have;
input recnum A_returns B_returns;
datalines;
01 0.011 0.0101
02 . 0.0102
03 . 0.0103
04 .054 0.0104
05 -.015 0.0105
06 -.016 0.0106
07 -.017 0.0107
08 . 0.0108
09 . 0.0109
10 . 0.0110
11 .021 0.0111
12 .022 0.0112
13 .023 .
14 .024 0.0114
15 .025 0.0115
16 .026 0.0116
run;
data want1;
set have;
retain _retB;
if A_returns^=lag(A_returns) and A_returns=. then do;
do ptrA =_n_ by 1 until (adjusted_retA^=. or _error_=1);
set have (keep=A_returns rename=(A_returns=adjusted_retA)) point=ptrA;
_retB=sum(1,_retB)*(1+B_returns)-1;
end;
adjusted_retA=exp(log(adjusted_retA+1)/(ptrA+1-_N_))-1;
end;
if ptrA<_N_ then adjusted_retA=A_returns;
if ptrA=_N_ then adjusted_retB=_retB;
else if A_returns^=. then adjusted_retB=B_returns;
drop _retB;
run;
Note: if you want to better understand the use of comparing ptrA to _N_ (i.e. "if ptrA<_N_" and "if ptrA=_N_") then, just before the RUN statement, issue a
POINTER_A=ptrA;
statement. You have to do that, because using ptrA in the "point=" argument of SET makes it an "automatic" variable, which is not kept in the output data set. So just copy it to a user-generated variable, which you will see in the resulting data set.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.