I am not asking about the data release behavior of the firms. I am asking about how the firm data is structured by Compustat in your data set. After all, you could have one of these two situations:
Which of these above represents your cases of missing quarters in a 5-year time span? The operational treatment of your data would differed based on your answer.
OK, so an entire quarterly record can be skipped. In that case, I'd suggest the following:
data v_have_template / view=v_have_template;
set have (keep=gvkey fyearq);
by gvkey fyearq;
if first.fyearq then do fqtr=1 to 4;
output;
end;
run;
data want;
merge have v_have_template;
by gvkey fyearq fqtr;
array _roa{0:19} _temporary_; /* Rolling 20 qtrs of data */
if first.gvkey then call missing(of _roa{*});
_roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */
if fqtr=4; /* Process only Q4 records */
n_roaq=n(of _roa{*}); /* Count non-missing roaq values */
if n_roaq >=16 then std_roaq=std(of _roa{*}); /* Editted: changed 4 to 16 */
run;
Because there can be holes in HAVE (entire records missing for a given gvkey/fyearq/fqtr), this program creates the data set V_TEMPLATE_HAVE, which is nothing more than a dummy record for every fyearq/fqtr in a gvkey's entire time span. I.e. if a given fyearq/fqtr is missing in have, it is nevertheless present in v_have_templare.
The second data step merges have with have_template. Any case in which a gvkey/fyearq/fqtr is present in have_template but absent in have will result in a record with valid gvkey and date values, but missing values for roaq, cfq, etc.
Note that v_template_have is a data set VIEW, not a data set FILE. That is, it is not activated until v_template_have is called for later in the program. As a result its observations are streaming to the calling process, not written to disk. Same data results, but more efficient due to disk input/output reduction.
Also in the second data step:
If you want to do the same for CFQ, just define another array
_cfq{0:19} _temporary)
and process is analogously to _roa.
I am using SAS 9.4 TS1M3 and do not get that error, so I assume you are using an older version.
You can change the temporary array to a normal array of variables, and retain those variables:
data v_have_template / view=v_have_template;
set have (keep=gvkey fyearq);
by gvkey fyearq;
if first.fyearq then do fqtr=1 to 4;
output;
end;
run;
data want (drop=_:);
merge have v_have_template;
by gvkey fyearq fqtr;
array _roa{0:19} _roa0-_roa19; /* Rolling 20 qtrs of data */
retain _roa: ;
if first.gvkey then call missing(of _roa{*});
_roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */
if fqtr=4; /* Process only Q4 records */
n_roaq=n(of _roa{*}); /* Count non-missing roaq values */
if n_roaq >=16 then std_roaq=std(of _roa{*}); /* Editted: changed 4 to 16 */
run;
Thank you very much! It is helpful. Cheers, Thierry
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.