BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mahler_ji
Obsidian | Level 7

Hello All,

I would like to create a variable that, for each day, creates a variable that represents the following 30 days of returns.  The data takes the following form:

Firm               Date                    Ret

A                    1/1/11                    .1

A                    1/2/11                    .01

A                    1/3/11                    .1

A                    1/4/11                    .2

.

.

.

.

B                    1/1/11                    .05

B                    1/2/11                    .1

.

.

.

So the idea is for the 1/1/11 observation for Firm A, the variable would return the cumulative return, starting that day, and going forward one day.  Just to make it simple, we could use addition, rather than a compounding return:  .1 + .01 + .1 + .2 and so on for thirty days.  And for the 1/2/11 observation for Firm A, it would be: .01 + .1 + .2 and so on for thirty more days.

And this would continue with all the firms that I have.

Any help would be greatly appreciated. I am going to be downloading a small file for you guys to play with if you need data to work out the problem.

Thanks!

John

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

Try this slight modification:

proc sql;

create table want as

select

  a.firm,

  a.date,

  sum(b.ret) as sum_ret,

  exp(sum(log(1+b.ret)))-1 as compo_ret ,

  count(b.ret) as n

from have as a

left join have as b

on a.firm=b.firm and b.date-a.date between 0 and 30

group by a.firm, a.date;

quit;

You could also add the clause having n > 30 at the end of the query to keep conplete periods only.

PG

PG

View solution in original post

16 REPLIES 16
dkb
Quartz | Level 8 dkb
Quartz | Level 8

Obvious question: What should happen to the later records for each firm, where there are less than 30 subsequent days of data?

mahler_ji
Obsidian | Level 7

Hey , thanks for your question.

It doesn't really matter, because I am going to trim the data to get rid of those observations.  So they can either return something wrong or return a "."  Either one will do!

Any help would be amazing!  I am having a hard time figuring this one out.

Thanks,

John

user24feb
Barite | Level 11

Should work, but requires a merge:

Data A (Keep=Firm Date R);
  Do Firm_Nr=1 To 3;
    Do Time=0 To 50;
      Firm=Byte(Firm_Nr+64);
   Date=IntNX('day',"01JAN2000"d,Time);
   R=Normal(1)*0.02+0.03;
      Output;
    End;
  End;
  Format Date Date9.;
Run;

%Let Periods=30;
Data B;
  Retain Total;
  Set A;
  Date=IntNX('day',Date,-&Periods.);
  By Firm;
  If First.Firm Then Do;
   Total=0;
Count=0;
  End;
  Count+1;
  Drop_Obs=Lag&Periods.(R);
  If Count gt &Periods Then Total=Sum(Total,R,-Drop_Obs);
  Else Total=Sum(Total,R);
Run;

Data C;
  Merge A (in=inA) B (Keep=Firm Date Total in=inB);
  By Firm Date;
  If inA;
Run;

Reeza
Super User

A SQL self join is a quick way, this will return an invalid number if you have less than 30 days of records. You could add a count to keep track of how many records and either eliminate them or set them to missing in the same query if you wanted.

(untested)

proc sql;

create table want as

select a.firm, a.date, sum(b.ret) as cum_ret

from have as a

left join have as b

on b.date-a.date between 0 and 30

order by a.firm, a.date;

quit;

mahler_ji
Obsidian | Level 7

Hey ,

Thank you, as always, for your help.  For some reason, the code that you wrote is "hanging," meaning that I submit it and then it just sits there, without giving me an error.  I don't really know what is going on, but I really like the idea behind your code, as it seems very intuitive to me.

Do you have any idea why this may be?

Thanks,

John

PGStats
Opal | Level 21

Try this slight modification:

proc sql;

create table want as

select

  a.firm,

  a.date,

  sum(b.ret) as sum_ret,

  exp(sum(log(1+b.ret)))-1 as compo_ret ,

  count(b.ret) as n

from have as a

left join have as b

on a.firm=b.firm and b.date-a.date between 0 and 30

group by a.firm, a.date;

quit;

You could also add the clause having n > 30 at the end of the query to keep conplete periods only.

PG

PG
mahler_ji
Obsidian | Level 7

Hey ,

Thanks for your help, I am going to to run this code right now. I thought that the issue might have had something to do with the fact that I had to "firm" requirement in the on statement.

Just checking, should there be two "bys" in the group by clause?  I have never seen that before!

Thanks,

John

PGStats
Opal | Level 21

No! that was a typo. I corrected it.

PG

PG
KachiM
Rhodochrosite | Level 12

John,

Let us have a small number of dates. There are 6 days for A and

4 for B. It can be made to work with 30 days or for any number of days.

Just change the macro variable, &days, to your context. Also, you need

to give the expected maximum number of dates. For instance, in the

following data set, Firm A has the maximum dates(6).

An Array is used to save the RET values for a Firm at a time to

build up the cumulative sum of RETs. To be efficient, the sums

beyond &days, are made in such a way that one subtraction and

one addition are made for every day thereafter.

As sums are cumulatively made and when the number of observations

forward are lesser than &days, partial sums are made, which you can take

or ignore. Once a Firm is finished, the next Firm is processed in

a DoW-loop.

data have;

informat Date mmddyy8.;

format Date date10.;

input Firm $ Date Ret;

datalines;

A    1/1/11   .1

A    1/2/11   .01

A    1/3/11   .1

A    1/4/11   .2

A    1/5/11   .3

A    1/6/11   .2

B    1/1/11   .05

B    1/2/11   .1

B    1/3/11   .2

B    1/4/11   .1

;

run;

proc sort data = have;

by Firm ;

run;

%let days = 3;

%let maxdates = 6;

data need;

array k[&maxdates] _temporary_;

   do ind = 1 by 1 until(last.Firm);

      set have;

      by Firm;

      k[ind] = Ret;

   end;

   sum_Ret = 0;

   * Find first sum for &days ;

   do ind = 1 to &days;

      sum_Ret + k[ind];

   end;

   do ind = 1 by 1 until(last.Firm);

      set have;

      by Firm;

      if ind = 1 then output;

   * Find subsequent sums by adjusting SUM_RET (removing and adding RET);

      else do;

         sum_Ret +- k[ind - 1];

         if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];

         output;

      end;

   end;

   call missing(of k

  • );
  • drop ind;

    run;

    Muthia Kachirayan

    mahler_ji
    Obsidian | Level 7

    Hey and ,

    I have been playing with the code that you each posted and have still not been able to make it work.  I have gone back and attached an exerpt of the dataset that I am working with.  The entire dataset is large, something like 700MB and has 11mm observations.

    When I run the following code (obviously calling my own libraries, etc) the code just sort of sits there and does nothing.

    proc sql;

      create table mylib.twt2_2 as

      select t1.CUSIP,

      t1.DATE,

      sum(t2.RET) as cum_ret,

      exp(sum(log(1+t2.RET)))-1 as compo_ret,

      count(t2.RET) as n

      from mylib.twt2_1 as t1

      left join mylib.twt2_1 as t2

      on t2.date-t1.date between 0 and 30 & (t1.CUSIP = t2.CUSIP)

      group by t1.CUSIP, t1.date;

    quit;

    Could you guys have a look at the dataset that I have attached and see what may be going wrong?

    Thanks!!

    KachiM
    Rhodochrosite | Level 12

    I do not see your data.

    By the way did you try the ARRAY approach I posted. If you have problems to understand/use the ARRAY approach, I can help you as much as I can.

    Reeza
    Super User

    What version of SAS are you using?

    KachiM
    Rhodochrosite | Level 12

    I am not sure what is your BYVar in TEST1. I am assuming it is STOCK_SYMBOL. If it is not you may replace it by your choice. Here is the Array way. There are 2265 rows with 'A' which is more than the next symbol 'ABI'.

    I have not checked for the gaps in the dates. It is better you pre-process the data set and fill the gaps with suitable value for RET before running this program.

    %let days = 30;

    %let maxdates = 2265;

    data need;

    array k[&maxdates] _temporary_;

       do ind = 1 by 1 until(last.stock_symbol);

          set test1;

          by stock_symbol;

          k[ind] = Ret;

       end;

       sum_Ret = 0;

       * Find first sum for &days ;

       do ind = 1 to &days;

          sum_Ret + k[ind];

       end;

       do ind = 1 by 1 until(last.stock_symbol);

          set test1;

          by stock_symbol;

          if ind = 1 then output;

       * Find subsequent sums by adjusting SUM_RET (removing and adding RET);

          else do;

             sum_Ret +- k[ind - 1];

             if ind <= (dim(k) - &days + 1) then sum_Ret ++ k[ind + &days - 1];

             output;

          end;

       end;

       call missing(of k

  • );
  • drop ind;

    run;

    proc print data = need;

    run;

    Hope this works for you.

    PGStats
    Opal | Level 21

    I notice that your dataset only includes weekdays. So a 30 day interval will include less than 30 datapoints. It is not clear how you want to handle that. 

    Otherwise, the only thing being wrong is your dataset is its size! 

    If you have access to SAS/ETS, you should try proc expand to get what you want:

    /* Simple sum of returns */

    proc expand data=sasforum.test1 out=test1;

    by permno notsorted;

    id date;

    convert ret=ret_30 / transformout=(reverse movsum 30 reverse);

    run;

    /* Compounded product of returns */

    proc expand data=sasforum.test1 out=test2;

    by permno notsorted;

    id date;

    convert ret=compo_ret_30 / transformout=(+1 log reverse movsum 30 reverse exp -1);

    run;

    it should be a lot faster than SQL.

    (Tested with your example dataset)

    PG

    PG

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    What is Bayesian Analysis?

    Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 16 replies
    • 3261 views
    • 0 likes
    • 7 in conversation