hello everyone
I am new to sas and my question is a bit basic.
I want to extract for each fund_id in each month , the fama french alpha using 60 month previous observations using this model
mretf = alpha +b1 MKTRF+b2 SMB + b3 HML+ b4 UMD; my sample period is 2001-2016 and the starting period of my study is 01/01/2005 .
I have tried the following code and I think it works well but my question is about the index ( I=mod(N-1,60)+1). Since the table rf4 created by sas,skip the first observation at each new id. I have posted here for one fund an example of the data.
For 31 jan 2005 , the table rf4 created by sas skip the first obs( 31 jan 2000) and begins by 29 fev 2000 until 31 jan 2005.
Your help is much appreciated. Thanks
CALDT | FACTSET_FUND_ID | mretf | SMB | HML | MKTRF | UMD |
31-Jan-00 | A | 0.02107 | 0.0501 | -0.0045 | -0.0474 | 0.0188 |
29-Feb-00 | A | 0.25627 | 0.2208 | -0.1049 | 0.0245 | 0.1838 |
31-Mar-00 | A | -0.08662 | -0.1717 | 0.0786 | 0.052 | -0.068 |
28-Apr-00 | A | -0.11902 | -0.0777 | 0.0857 | -0.064 | -0.0852 |
31-May-00 | A | -0.09957 | -0.0494 | 0.0239 | -0.0442 | -0.0906 |
30-Jun-00 | A | 0.085008 | 0.1395 | -0.1003 | 0.0464 | 0.165 |
31-Jul-00 | A | -0.02268 | -0.0278 | 0.0815 | -0.0251 | -0.0012 |
31-Aug-00 | A | 0.085901 | -0.0111 | -0.0069 | 0.0703 | 0.057 |
29-Sep-00 | A | -0.09272 | -0.014 | 0.0623 | -0.0545 | 0.0215 |
31-Oct-00 | A | -0.07226 | -0.0377 | 0.0555 | -0.0276 | -0.0463 |
30-Nov-00 | A | -0.16127 | -0.0277 | 0.1129 | -0.1072 | -0.0244 |
29-Dec-00 | A | 0.029903 | 0.0096 | 0.073 | 0.0119 | 0.0673 |
31-Jan-01 | A | 0.066976 | 0.0656 | -0.0488 | 0.0313 | -0.2501 |
28-Feb-01 | A | -0.15926 | -0.0072 | 0.1291 | -0.1005 | 0.1248 |
29-Mar-01 | A | -0.10098 | 0.0039 | 0.0649 | -0.0726 | 0.0838 |
30-Apr-01 | A | 0.085569 | 0.005 | -0.0469 | 0.0794 | -0.0812 |
31-May-01 | A | 0.002938 | 0.026 | 0.0313 | 0.0072 | 0.0217 |
29-Jun-01 | A | -0.02129 | 0.0603 | -0.0109 | -0.0194 | 0.0034 |
31-Jul-01 | A | -0.04485 | -0.0435 | 0.0561 | -0.0213 | 0.055 |
31-Aug-01 | A | -0.05641 | 0.025 | 0.0256 | -0.0646 | 0.0554 |
28-Sep-01 | A | -0.12993 | -0.0611 | 0.0156 | -0.0925 | 0.1155 |
31-Oct-01 | A | 0.047347 | 0.0756 | -0.0802 | 0.0246 | -0.0838 |
30-Nov-01 | A | 0.032566 | -0.004 | 0.02 | 0.0754 | -0.0857 |
31-Dec-01 | A | 0.018145 | 0.0456 | 0.0108 | 0.0161 | -0.0003 |
31-Jan-02 | A | -0.01629 | 0.0119 | 0.0335 | -0.0144 | 0.0376 |
28-Feb-02 | A | -0.01538 | -0.0114 | 0.0251 | -0.0229 | 0.0679 |
28-Mar-02 | A | 0.045198 | 0.0426 | 0.0111 | 0.0424 | -0.0164 |
30-Apr-02 | A | -0.00575 | 0.0594 | 0.0392 | -0.052 | 0.0798 |
31-May-02 | A | -0.0054 | -0.0322 | 0.017 | -0.0138 | 0.0304 |
28-Jun-02 | A | -0.04528 | 0.0426 | 0.0012 | -0.0721 | 0.0614 |
31-Jul-02 | A | -0.09043 | -0.0537 | -0.0352 | -0.0818 | 0.0338 |
30-Aug-02 | A | -0.00355 | -0.0243 | 0.0253 | 0.005 | 0.0175 |
30-Sep-02 | A | -0.06487 | 0.0255 | 0.0126 | -0.1035 | 0.0915 |
31-Oct-02 | A | 0.024372 | -0.0286 | -0.054 | 0.0784 | -0.0546 |
29-Nov-02 | A | 0.032228 | 0.0295 | -0.0095 | 0.0596 | -0.1628 |
31-Dec-02 | A | -0.0285 | 0 | 0.0221 | -0.0576 | 0.0962 |
31-Jan-03 | A | -0.02095 | 0.0144 | -0.0083 | -0.0257 | 0.0158 |
28-Feb-03 | A | -0.01952 | -0.0035 | -0.0147 | -0.0188 | 0.0125 |
31-Mar-03 | A | 0.030325 | 0.01 | -0.0193 | 0.0109 | 0.0152 |
30-Apr-03 | A | 0.063457 | 0.0061 | 0.0111 | 0.0822 | -0.0942 |
30-May-03 | A | 0.070241 | 0.0484 | -0.001 | 0.0605 | -0.1076 |
30-Jun-03 | A | 0.021306 | 0.016 | 0.0057 | 0.0142 | -0.0102 |
31-Jul-03 | A | 0.031667 | 0.0534 | -0.0102 | 0.0235 | -0.0029 |
29-Aug-03 | A | 0.036795 | 0.0265 | 0.021 | 0.0234 | -0.0054 |
30-Sep-03 | A | 0.010955 | 0.0081 | 0.0001 | -0.0124 | -0.0019 |
31-Oct-03 | A | 0.06943 | 0.0281 | 0.0193 | 0.0608 | 0.0375 |
28-Nov-03 | A | 0.023784 | 0.0216 | 0.0196 | 0.0135 | 0.0163 |
31-Dec-03 | A | 0.051092 | -0.0305 | 0.0235 | 0.0429 | -0.0569 |
30-Jan-04 | A | 0.037932 | 0.0274 | 0.0196 | 0.0215 | 0.0259 |
27-Feb-04 | A | 0.032609 | -0.0143 | 0.0048 | 0.014 | -0.011 |
31-Mar-04 | A | -0.00011 | 0.0177 | 0.0025 | -0.0132 | 0.002 |
30-Apr-04 | A | -0.02934 | -0.0212 | -0.0262 | -0.0183 | -0.0536 |
31-May-04 | A | 0.013562 | -0.0019 | -0.0035 | 0.0117 | 0.0165 |
30-Jun-04 | A | 0.016311 | 0.0223 | 0.0135 | 0.0186 | 0.0207 |
30-Jul-04 | A | -0.0625 | -0.038 | 0.0409 | -0.0406 | -0.023 |
31-Aug-04 | A | 0.011417 | -0.0159 | 0.0096 | 0.0008 | -0.015 |
30-Sep-04 | A | 0.04196 | 0.0305 | -0.0024 | 0.016 | 0.0526 |
29-Oct-04 | A | 0.041746 | 0.0033 | -0.0056 | 0.0143 | -0.0149 |
30-Nov-04 | A | 0.070328 | 0.039 | 0.0181 | 0.0454 | 0.0322 |
31-Dec-04 | A | 0.050927 | 0.0016 | -0.0004 | 0.0343 | -0.0283 |
31-Jan-05 | A | -0.01264 | -0.0153 | 0.0195 | -0.0276 | 0.0312 |
data rf4 / view= rf4;
array _X1 {60} _temporary_ ;
array _X2 {60} _temporary_ ;
array _X3 {60} _temporary_ ;
array _X4 {60} _temporary_ ;
array _Y {60} _temporary_ ;
set rf3m;
by FUND_ID;
retain N 0;
N = ifn(first.FUND_ID,1,N+1);
I=mod(N-1,60)+1;
_X1{I}=MKTRF;
_X2{I}=SMB;
_X3{I}=HML;
_X4{I}=UMD;
_Y{I}=mretf;
if N>=60 then do I= 1 to 60;
MKTRF=_X1{I};
SMB=_X2{I};
HML=_X3{I};
UMD=_X4{I};
mretf=_Y{I};
output;
end;
run;
proc reg data=rf4 outest=resultsalpha noprint;
by FUND_ID caldt;
model mretf = MKTRF SMB HML UMD;
run;
Hello,
Since I don't really see a question, I can not give you an answer but only a general advice.
When you want to figure out what is going on when your program is executed, the best
is to work with a simplified example. For instance, you can use only one explanatory variable and
a rolling window of 4 observations.
data have;
input id y x;
cards;
1 12 5
1 23 8
1 10 2
1 15 7
1 31 16
1 17 9
1 22 12
;
run;
data prepare;
array _X {4} _temporary_ ;
array _Y {4} _temporary_ ;
set have;
by id;
retain N 0;
N = ifn(first.id,1,N+1);
I=mod(N-1,4)+1;
_X{I}=x;
_Y{I}=y;
if N>=4 then do I= 1 to 4;
x=_X{I};
y=_Y{I};
output;
end;
run;
The output dataset we obtain is :
Obs | id | y | x | N | I |
---|---|---|---|---|---|
1 | 1 | 12 | 5 | 4 | 1 |
2 | 1 | 23 | 8 | 4 | 2 |
3 | 1 | 10 | 2 | 4 | 3 |
4 | 1 | 15 | 7 | 4 | 4 |
5 | 1 | 31 | 16 | 5 | 1 |
6 | 1 | 23 | 8 | 5 | 2 |
7 | 1 | 10 | 2 | 5 | 3 |
8 | 1 | 15 | 7 | 5 | 4 |
9 | 1 | 31 | 16 | 6 | 1 |
10 | 1 | 17 | 9 | 6 | 2 |
11 | 1 | 10 | 2 | 6 | 3 |
12 | 1 | 15 | 7 | 6 | 4 |
13 | 1 | 31 | 16 | 7 | 1 |
14 | 1 | 17 | 9 | 7 | 2 |
15 | 1 | 22 | 12 | 7 | 3 |
16 | 1 | 15 | 7 | 7 | 4 |
If you look at obervations 5 to 8, you see that correspond to observations 2 to 5 of the orginal dataset
(though not in the same order, the 5th observation appearing first).
Obs 9 to 12 correspond to obs 3 to 6 of the original dataset and so on.
The program thus seems to do what you want, that is preparing the data for a rolling regression
with proc reg.
The fact that you received no answer yet may indicate that your question is unclear.
Please reformulate your question as I am sure that many contributors on this forum
would be able to answer.
Hello
Thank you for your reply.
Maybe you are right I should use a simplified example .
I have posted the sas code. My goal is to extract the beta of x . My question is that if I have missing values for my X and I want to run the regression only if I have 4 previous non missing values of X how can I do that ??
Since with these codes I obtain betas for date_code 04 ,05,06,07 , but I want only the beta of date_code 07 since it have full 4 previous observation.
Thank you in advance for you help
data have1;
input id $date_code y x;
cards;
1 01 12 .
1 02 23 .
1 03 10 .
1 04 15 7
1 05 31 16
1 06 17 9
1 07 22 12
;
run;
data prepare1;
array _X {4} _temporary_ ;
array _Y {4} _temporary_ ;
set have1;
by id;
retain N 0;
N = ifn(first.id,1,N+1);
I=mod(N-1,4)+1;
_X{I}=x;
_Y{I}=y;
if N>=4 then do I= 1 to 4;
x=_X{I};
y=_Y{I};
output;
end;
run;
proc reg data=prepare1 outest=result noprint;
by id date_code ;
model y = x;
run;
quit;
Hello,
data have1;
input id $date_code y x;
cards;
1 01 12 .
1 02 23 .
1 03 10 .
1 04 15 7
1 05 31 16
1 06 17 9
1 07 22 12
1 08 12 21
1 09 23 .
1 10 15 7
1 11 31 16
1 12 17 9
1 13 22 12
1 14 12 21
1 15 12 5
1 16 23 .
1 17 8 2
1 18 12 14
1 19 10 5
1 20 16 .
;
run;
/* 1/ We count the number of consecutive complete observations */
data have2 (where=(x ne .));
set have1;
by id;
retain sample size;
if first.id then sample=0;
x1=lag(x);
/* new sample if current obs is OK while previous is not */
if x ne . and x1=. then do;
sample=sample+1;
size=0;
end;
/* We increase the size for each complete observation */
if x ne . then size=size+1;
run;
/* 2/ We keep only samples with size greater or equal 4 */
proc sql noprint;
CREATE TABLE have3 AS
SELECT * FROM have2
GROUP BY ID, sample
HAVING max(size) ge 4
ORDER BY ID, sample, date_code
;
quit;
/* 3/ We prepare the data for rolling regressions */
data prepare1;
array _X {4} _temporary_ ;
array _Y {4} _temporary_ ;
set have3;
by id sample;
retain N 0;
N = ifn(first.sample,1,N+1);
I=mod(N-1,4)+1;
_X{I}=x;
_Y{I}=y;
if N>=4 then do I= 1 to 4;
x=_X{I};
y=_Y{I};
output;
end;
run;
/* 4/ We run the regressions */
proc reg data=prepare1 outest=result noprint;
by id date_code ;
model y = x;
run;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.