BookmarkSubscribeRSS Feed
SeanZ
Obsidian | Level 7

I have a data set "have". In the data set, there are three variables, permno(firm), mon(month) and ret(return). What I want to calculate is that for each firm each month, I would like to calculate a new variable (sumret3), which is the sum of the past three returns. I wrote the following codes.

proc sql;

  create table want as

  select distinct a.*, sum(b.ret) as sumret3

  from have as a left join have as b

  on a.permno=b.permno and (a.mon-3<=b.mon<a.mon)

  group by a.permno, a.mon;

  /*having count(*) =3;*/

quit;

Now the problem I have is that for each firm, some months may not have full three months' returns. Using the code above will give a new value for sumret3 even there is only one past return (Now the sum will be only one value). If there is only two previous returns for that permno, it will give the sum of two returns. I only want to get values for sumret3 if there are three previous returns present. If less than three returns, then give sumret3 a missing value.

I used a contraint  /*having count(*) =3;*/. This works, but this will give less observations. For example, the final data set "want" should have 1 million observations. Without using the contraint, "want" has 1 million observations. With the constraint, "want" only has 600,000 observations. I believe this is due to using distinct option in the third line, which will only keep the first observation with missing data.

How to deal with this problem? Also, is there any other efficient methods to implement my idea?

Thanks.

6 REPLIES 6
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Can you not just left join the data per above back to the original data to get full observations?

proc sql;

  create table want as

     select     ORIGINAL_DATA.*,

                    ADD_DATA.*

     from        INITIAL_DATA ORGINAL DATA

     left join    (select distinct a.*, sum(b.ret) as sumret3

                      from have as a left join have as b

                      on a.permno=b.permno and (a.mon-3<=b.mon<a.mon)

                      group by a.permno, a.mon;

                      having count(*) =3)

     on          ...

quit;

It may not be the best way to do it though.  You could also try 3 left joins to the main table:

proc sql;

     create table WANT as

     select     BASE.PERMNO,

                    BASE.MON,

                    A.RET + B.RET + C.RET as SUMRET

     from       ORIGINAL_DATA BASE

     left join   ORIGINAL_DATA A

     on           BASE.PERMNO=A.PERMNO

     and         A.MON=BASE.MON-1

     left join   ORIGINAL_DATA B

     on           BASE.PERMNO=A.PERMNO

     and         A.MON=BASE.MON-2

...

quit;

Or maybe even datastep:

proc transpose data=original_data out=inter prefix=mon;

  by permno;

  var ret;

  id mon;   /* Assuming month is numeric */

run;

proc sql;

  select  count(*)-1

  into    :NUM_OBS

  from    SASHELP.VCOLUMNS

  where   LIBNAME="WORK"

    and   MEMNAME="INTER";

quit;

data want;

  set inter;

  array mon{&NUM_OBS.};

  array sumrets{&NUM_OBS.-2};

  do i=3 to &NUM_OBS.;

    sumrets{i-}=sum(mon{i-2},mon{i-1},mon{i});

  end;

run;

Many ways to approach the problem, just a matter of testing them and seeing which is quicker/fits into your coding easier.

LinusH
Tourmaline | Level 20

You probably need to pre.process your data, to find the latest 3 returns.This could include order your data by month, counting observations (months) and select max 3 observations. Then go to the SQL summary step.

Data never sleeps
jakarman
Barite | Level 11

Review what you want to show with those figures.

Absolute returns on quarterly results? Then what you have noticed is a remark for information consumers.  

A relative performance comparable to others?  Use a mean. Decide on complete/incomplete month's quarters as a note.
A relative growth evolving in time to overall results. impute the overall mean as missing. You can manipulate the figures and pictures to achieve things as being liked by the targeted goals.
 

---->-- ja karman --<-----
Ksharp
Super User

You'd better post some sample data . No data to test.

proc sql;

  create table want as

   select distinct a.*,(select  sum(ret) from have where  a.permno=permno and (a.mon-3<=mon<a.mon) ) as sumret3

     from have as a  ;

quit;

Xia Keshan

SeanZ
Obsidian | Level 7

Thanks for the help. Here is some data

permno   date           ret           mon

1556019251231.23112
1556019260130-0.00900923113
1556019260227-0.05454523114
1556019260331-0.29807723115
15560192604300.06849323116
15560192605280.00000023117
1556019260630-0.08974423118
1556019260731-0.07042323119
15560192608310.07575823120
1556019260930-0.07042323121
1556019261030-0.07575823122
15560192611300.11475423123
15560192612310.00000023124
15560192701310.05882423125
1556019270228-0.02777823126
15560192703310.01428623127
1556019270430-0.14084523128
15560192705310.29508223129
1556019270630-0.01265823130
1556019270730-0.02564123131

I only use 20 observations for one firm(permno) as an example. Here I standardized mon as a number, it will not affect the use of it. Since I want to calculate the sum of ret ONLY IF three observations are available, sum(ret) should have values start from the 4th column. You code didn't show the desired results. Moreover, I am not sure why your code took incredibly long time to execute. Thanks for the help.

Ksharp
Super User

" I am not sure why your code took incredibly long time to execute."

That is because I am using sub-query of SQL.

OK. Since you posted some data to make the question more clear.

Assuming these month are consecutive . i.e. no missing month between two row .

data have;
input permno   date           ret           mon     ;
cards;
15560     19251231     .     23112
15560     19260130     -0.009009     23113
15560     19260227     -0.054545     23114
15560     19260331     -0.298077     23115
15560     19260430     0.068493     23116
15560     19260528     0.000000     23117
15560     19260630     -0.089744     23118
15560     19260731     -0.070423     23119
15560     19260831     0.075758     23120
15560     19260930     -0.070423     23121
15560     19261030     -0.075758     23122
15560     19261130     0.114754     23123
15560     19261231     0.000000     23124
15560     19270131     0.058824     23125
15560     19270228     -0.027778     23126
15560     19270331     0.014286     23127
15560     19270430     -0.140845     23128
15560     19270531     0.295082     23129
15560     19270630     -0.012658     23130
15560     19270730     -0.025641     23131
15561     19270430     -0.140845     23128
15561     19270531     0.295082     23129
15561     19270630     -0.012658     23130
15561     19270730     -0.025641     23131
;
run;
proc sort data=have ;by  permno mon;run;
data want; 
 set have;
 by permno;
 if first.permno then n=0;
 n+1;
 sumret3=sum(lag(ret),lag2(ret),lag3(ret));
 if n lt 4 then  sumret3=.;
run;



Xia Keshan

Message was edited by: xia keshan

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 840 views
  • 6 likes
  • 5 in conversation