BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jrdykstr93
Obsidian | Level 7

Hi Everyone,

 

This is my first post so please point out any flaws in my approach to asking this question.

 

I have a dataset that includes foreign and domestic stock returns by day. I am trying to interpolate returns based on the foreign returns trading days in for two scenarios so I can run a smooth, consistent event study.

 

Let A = Foreign Returns and B = Domestic

 

1) First scenario. A has return data while B does not. For example, lets just say that A is trading on the July 3,4 and 5 of 2008 but B is only trading on July 3 and 5 of 2008. The returns for A on July 3, 4, and 4 are -0.013854, -0.059828, and 0.051853, respectfully. The Returns for B on July 3 and 5 are 0.022954964 and 0.0017698109 , respectfully, while there is a null value for Return B on July 4. What i want is to discount the value of .0017698109 back one day. So my formula would look something like this: (1+.0017698109)^(-1/2). This then gives me a value of 0.0008845142. I then want to take this value and place it as the return value for B on July 4 and July 5. So, replacing a null value with 0.0008845142 and the value 0.0017698109 with 0.0008845142.

 

2) Second scenario. A has no return data while B does. For example, lets just say that A is trading on the March 20 and 22 of 2008 and B is only trading on March 20, 21, and 22. The returns for A on March 20 and 22 are -0.028234 and 0.052298,respectfully, with a null value on March 21. The Returns for B on March 20, 21, and 22 are 0.0638884, -0.00155123567, -0.0281506334, respectfully. In this case, I am taking (1+-0.00155123567) * (1+ -0.0281506334) - 1 to yield a result of -0.02965820. This result will then replace the value of Return B on March 22 and the value on March 21 is then turned into a null value. So, -0.0281506334 is replaced with -0.02965820 and -0.00155123567 is replaced with a null value.

 

The trickiest part is, I want this program to run if there is a difference of more than just one day. So if there was two days in a row, then the formula for scenario 1 would be (1+final_day_returns)^(-1/(days that are null +1). You would accumulate scenario two as the multiple of 3 days then instead of two, etc..

 

This is very easy to do in excel, and I will attach the excel file with the tabs of how I did it. But i would like to automate the process with SAS. Any ideas or code that you could reference or give me would be very much appreciated.

 

To be frank, I am quite new to SAS and have usually worked in excel. I am guessing the best place to start with this would be to label a macro array of your forein returns and one of your domestic returns (if you have more than one series for each) and then work with "do if" statements but I could very much use some direction.

 

Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This simple code seems to give the expected result

 

data have;
infile "&sasforum.\datasets\Returns.csv" firstobs=2 dsd truncover;
input DATE :mmddyy10. A B B_changed;
format date yymmdd10.;
run;

data spread;
do nb = 1 by 1 until(not missing(B));
    set have;
    end;
br = B;
do i = 1 to nb;
    set have;
    if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
    else B_spread = B;
    output;
    end;
drop nb i br;
run;

data bunch;
br = 1;
do nb = 1 by 1 until(not missing(A));
    set spread;
    br = br * (1+B_spread);
    end;
do i = 1 to nb;
    set have;
    if i < nb then call missing(B_bunch);
    else B_bunch =  br - 1;
    output;
    end;
drop nb i br B_spread;
run;


PG

View solution in original post

14 REPLIES 14
ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

 

You description is not interpolation but more of a "distribution". Interpolation would be if the vale on day 1 is 5  and on day 4 is 12 then the value for day 2 would be 7.3333 and day 3 of 9.666 where the change between to values is distributed across the intervals added to the first.

 

Are your values consecutive days (date values)? If not does the rule change depending on the type of gap?

Example if your dates are

01/08/2018   .123

01/09/2018    .

01/12/2018   .075

would the result be same for the missing value if the data were as follows?

01/08/2018   .123

01/11/2018    .

01/12/2018   .075

jrdykstr93
Obsidian | Level 7

Thanks for the advice.  I am having a bit of trouble using your referenced link since I am using SAS EG.  I believe it should work I just don't understand what is going wrong. I will look into that further.  As of right now, i am attaching a tab deliminated txt file and a csv.  Both contain the same data and have four columns.  Date, A=Foreign returns, B= Domestic, and B_changed = What B returns are supposed to change into.  

 

As for your question, yes the result will be the same. 

 

Best,

jrdykstr93

PGStats
Opal | Level 21

This simple code seems to give the expected result

 

data have;
infile "&sasforum.\datasets\Returns.csv" firstobs=2 dsd truncover;
input DATE :mmddyy10. A B B_changed;
format date yymmdd10.;
run;

data spread;
do nb = 1 by 1 until(not missing(B));
    set have;
    end;
br = B;
do i = 1 to nb;
    set have;
    if nb > 1 then B_spread = (1+br)**(1/nb) - 1;
    else B_spread = B;
    output;
    end;
drop nb i br;
run;

data bunch;
br = 1;
do nb = 1 by 1 until(not missing(A));
    set spread;
    br = br * (1+B_spread);
    end;
do i = 1 to nb;
    set have;
    if i < nb then call missing(B_bunch);
    else B_bunch =  br - 1;
    output;
    end;
drop nb i br B_spread;
run;


PG
jrdykstr93
Obsidian | Level 7

Thanks a lot.  That works perfect.

jrdykstr93
Obsidian | Level 7

I understand the code pretty well.  But if you could take one second and explain to me how this part of the code below:  

 

do nb = 1 by 1 until(not missing(A));
set spread;
br = br * (1+B_spread);
end;

 

 

Is yielding a correct result?  I am running this in SAS and trying to follow by looking at data sets but it is not really making sense to me how br = br*(1_B_spread) is getting the result that we want.  It seems that it should yield something else looking at the data set. Can you explain?

 

Thanks,

jdykstr93

PGStats
Opal | Level 21

In that loop, br starts at 1 and then accumulates the product of the B returns corresponding to a sequence of missing A returns and the first non-missing A return. In the following loop, all B returns are set to missing except the last one which is set to the accumulated B returns. Thus the first non-missing B return becomes the accumulated product of it's original value and the values of the removed returns. 

 

hth

PG
Reeza
Super User

I would suggest looking at PROC TIMESERIES to fill in the blanks. There are several methods that may align with what you want. 

 

You need SAS/ETS license to do this. 

 

Here's a little example of how that could work:

https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52

 

Spoiler

@jrdykstr93 wrote:

Hi Everyone,

 

This is my first post so please point out any flaws in my approach to asking this question.

 

I have a dataset that includes foreign and domestic stock returns by day. I am trying to interpolate returns based on the foreign returns trading days in for two scenarios so I can run a smooth, consistent event study.

 

Let A = Foreign Returns and B = Domestic

 

1) First scenario. A has return data while B does not. For example, lets just say that A is trading on the July 3,4 and 5 of 2008 but B is only trading on July 3 and 5 of 2008. The returns for A on July 3, 4, and 4 are -0.013854, -0.059828, and 0.051853, respectfully. The Returns for B on July 3 and 5 are 0.022954964 and 0.0017698109 , respectfully, while there is a null value for Return B on July 4. What i want is to discount the value of .0017698109 back one day. So my formula would look something like this: (1+.0017698109)^(-1/2). This then gives me a value of 0.0008845142. I then want to take this value and place it as the return value for B on July 4 and July 5. So, replacing a null value with 0.0008845142 and the value 0.0017698109 with 0.0008845142.

 

2) Second scenario. A has no return data while B does. For example, lets just say that A is trading on the March 20 and 22 of 2008 and B is only trading on March 20, 21, and 22. The returns for A on March 20 and 22 are -0.028234 and 0.052298,respectfully, with a null value on March 21. The Returns for B on March 20, 21, and 22 are 0.0638884, -0.00155123567, -0.0281506334, respectfully. In this case, I am taking (1+-0.00155123567) * (1+ -0.0281506334) - 1 to yield a result of -0.02965820. This result will then replace the value of Return B on March 22 and the value on March 21 is then turned into a null value. So, -0.0281506334 is replaced with -0.02965820 and -0.00155123567 is replaced with a null value.

 

The trickiest part is, I want this program to run if there is a difference of more than just one day. So if there was two days in a row, then the formula for scenario 1 would be (1+final_day_returns)^(-1/(days that are null +1). You would accumulate scenario two as the multiple of 3 days then instead of two, etc..

 

This is very easy to do in excel, and I will attach the excel file with the tabs of how I did it. But i would like to automate the process with SAS. Any ideas or code that you could reference or give me would be very much appreciated.

 

To be frank, I am quite new to SAS and have usually worked in excel. I am guessing the best place to start with this would be to label a macro array of your forein returns and one of your domestic returns (if you have more than one series for each) and then work with "do if" statements but I could very much use some direction.

 

Thanks in advance!


PGStats
Opal | Level 21

Please post a CSV version of your example data. The colors in the Excel file are useful for understanding, we can look at them on the forum preview. But for downloading, a text file is much safer.

PG
PGStats
Opal | Level 21

It isn't very difficult to do this with datasteps. But you must provide some downloadable data for us.

PG
jrdykstr93
Obsidian | Level 7

Very good point.  Here is a csv and a tab delimited txt file for your use. 

 

Please let me know if these are in the correct format or if you desire another.  I am having a bit of trouble converting my data set to a data step right now but will look into that further.

 

Best,

jrdykstr93

mkeintz
PROC Star

Although my first thought on this question was to run PROC EXPAND (in the SAS/ETS product), in the end I don't think it will be suitable.  Let's say you have 2 consecutive missing values for A_RETURNS.  Yes, PROC EXPAND will replace those missing values, but it will try to smooth those values based on some combination of prior and/or subsequent non-missing values.  But that's not really what you want.  Instead, in the case of 2 missing values, you want to treat the following non-missing value as if it is a 3-day return, and partial it out into 3 daily returns that would compound to the observed 3-day return.  In other words, you not only want to replace the 2 missing returns, but also change the first non-missing return.  And, in the case of all other non-missing returns, you want to leave them untouched.  I don't see how to make PROC EXPAND do that.

 

In a data step, the best way is probably.

  1. read a record (the first SET statement below)
  2. detect whether it is the first instance of a run of missing A_return  values.  Say it happens on observation _N_.  If it is then.
    1. In a separate SET statement in a loop, temporarily read ahead to (a) count the number of missing returns until you hit a non-missing, and to get that nonmissing return (call it ADJ_RETURNA).  Say you count NM missing values, so you will want to divide the return into NM+1 days.
    2. calculate the daily ADJ_RETURNA  from the multiday ADJ_RETURNA as exp(log(adj_returna+1)/(NM+1))-1
  3. For  observations _N_ through  PTRA (=_N_ + Nmiss )  keep this ADJ_RETURNA.  Otherwise (i.e. if _N_> PTRA), ADJ_RETURNA=A_return.

Do the same for B_returns:

 

data want1;
  set have;

  if A_returns^=lag(A_returns) and A_returns=. then do;
    do ptrA =_n_ by 1 until (adjusted_retA^=. or _error_=1);
      set have (keep=A_returns rename=(A_returns=adjusted_retA)) point=ptrA;
    end;
    adjusted_retA=exp(log(adjusted_retA+1)/(ptrA+1-_N_))-1;
  end;
  if ptrA<_N_ then adjusted_retA=A_returns;

  if B_returns^=lag(B_returns) and B_returns=. then do;
    do ptrB =_n_ by 1 until (adjusted_retB^=. or _error_=1);
      set have (keep=B_returns rename=(B_returns=adjusted_retB)) point=ptrB;
    end;
    adjusted_retB=exp(log(adjusted_retB+1)/(ptrB+1-_N_))-1;
  end;
  if ptrB<_N_ then adjusted_retB=B_returns;
run;

 

Note that the SET statements inside the do loops each read in only one variable (A_returns or B_returns), and they are renamed to the new variables: adjusted_retA or adjusted_retB.   The until condition has an "or _error_=1" to provide a way to test is the SET statement is pointing beyond the end of dataset HAVE.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jrdykstr93
Obsidian | Level 7

This worked very well. It calculated scenario 1 returns perfectly.  I think that my initial question was a bit unclear though.  A returns are never changed.  Just B's.  The difference is one is being accumulated and one is being discounted based on a null value in column B or in Column A.  But this is doing what I want.  I think I will just have to change the code underneath "if A_returns.... then do; to reflect what I want in column B and it should work I think.  If I am able to do that, I will mark this answer as the correct solution.  May take me a few days though.  Thanks for all your help.

 

Many Thanks,

jrdykstr93

jrdykstr93
Obsidian | Level 7

I am still having a bit of trouble.  I need to select the return right after the null value as well in the scenario when A_return is .

 

For example-- 

Date             A_return      B_return      B_final

5/01/2008    .                   .4                 .

5/02/2008    .04               -.1                .26

 

B_final return on 5/02/2008 is equal to [(1 + .4) * (1+(-1))] - 1.  Your code for this scenario only goes up to the null values of A and will not incorporate the -.1 return of B (given that I changed your code a bit to select B values when A is null. Makes this scenario a bit trickier than the other.  I am still going to look into this myself and see if I can figure something out based off your sample.  I just wanted to be clear on what is happening with the data.

 

Best,

jrdykstr93

mkeintz
PROC Star

My revised understanding of your question, is that you want:

  1. Assign a sequence of identical "daily returns" for any sequence of missing A_returns and the first subsequent non-missing A_returns.  I.e. convert a single multiperiod return value into a set of identical sub-period returns generating the multiperiod result.

  2. And (this is the new understanding), you also want to create a sequence of B_returns whose missing value pattern matches that of the A_returns.   In other words, compound the B_returns over a period of missing A_returns, and assign the result to the record in which non-missing A_returns reappears.

 

Then the technique is to

  1. In the loop counting missing A_returns, also compound the B_returns.
  2. After that, assign the new value to the adjusted_retB for only the observation matching ptrA.  In all other cases with non-missing A_returns, just copy the original B_returns to the adjusted_retB.

 

data have;
  input recnum A_returns B_returns;
datalines;
01  0.011 0.0101
02  .     0.0102
03  .     0.0103
04  .054  0.0104
05 -.015  0.0105
06 -.016  0.0106
07 -.017  0.0107
08  .     0.0108
09  .     0.0109
10  .     0.0110
11 .021   0.0111
12 .022   0.0112
13 .023    .
14 .024   0.0114
15 .025   0.0115
16 .026   0.0116
run;

data want1;
  set have;

  retain _retB;
  if A_returns^=lag(A_returns) and A_returns=. then do;
    do ptrA =_n_ by 1 until (adjusted_retA^=. or _error_=1);
      set have (keep=A_returns rename=(A_returns=adjusted_retA)) point=ptrA;
      _retB=sum(1,_retB)*(1+B_returns)-1;
    end;
    adjusted_retA=exp(log(adjusted_retA+1)/(ptrA+1-_N_))-1;
  end;
  if ptrA<_N_ then adjusted_retA=A_returns;

  if ptrA=_N_ then adjusted_retB=_retB;
  else if A_returns^=. then adjusted_retB=B_returns;

  drop _retB;
run;

 

Note:  if you want to better understand the use of comparing ptrA to _N_ (i.e. "if ptrA<_N_" and "if ptrA=_N_") then, just before the RUN statement, issue a

    POINTER_A=ptrA;

statement.  You have to do that, because using ptrA in the "point=" argument of SET makes it an "automatic" variable, which is not kept in the output data set.  So just copy it to a user-generated variable, which you will see in the resulting data set.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3180 views
  • 4 likes
  • 5 in conversation