Regarding setting sigma to missing when trdays_3months<5, I wouldn't shift implementing that rule to the calculation of trdays_3months. I'd keep it right where you apparently have it - when you calculate sigma. BUT ... you can do that in the same step as in this code:
data want (drop=m);
set dsf;
by permno;
retain trdays_3months 0 m 0;
/* To hold three separate sums of monthly squared returns */
array mretsq{0:2} _temporary_ (3*0);
if dif(month(date))^=0 or first.permno=1 then do;
trdays_3months=ifn(dif3(permno)=0,dif3(_n_),.);
retsquare_3months=sum(of mretsq{*});
if trdays_3months>=5 then sigma=sqrt( (252/(N-1))*retsquare_3months );
/* Point to next element of array mretsq, and zero it out */
m=mod(m+1,3);
mretsq{m}=0;
end;
mretsq{m}+ret**2;
run;
This code declare a 3-element temporary array MRETSQ, to hold sum-of-squared-returns for each of the last three months. It's a temporary array, so (1) no new variables are output to the new data set, and (2) values are automatically retained for record to record. The variable m is used to identify which element of the array to add the current record (element 0, element 1, or element 2). But the actual adding of the current record to the monthly total takes place AFTER the code for discovering a change in month. That's because you want to generate sigma for prior three months without contamination from the current record (i.e. the first record of the new month).
So the if/then statement is now the same IF test, but has a "then do" group instead of a single "then" action. The do group includes calculation of two new variables: retsquare_3months, and sigma. And it also rotates M to the next mretsq element, by using the mod function.
... View more