BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mkeintz
PROC Star

I am not asking about the data release behavior of the firms.  I am asking about how the firm data is structured by Compustat in your data set.   After all, you could have one of these two situations:

 

  1. A record for FYEAR 2002, Q2 followed by a record for FYEAR 2002, Q4.  That's a missing quarterly record,
        or you could have
  2. FYEAR 2002, Q2 followed by FYEAR 2002, Q3, but with roaq set to missing.

 

Which of these above represents your cases of missing quarters in a 5-year time span?  The operational treatment of your data would differed based on your answer.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Thierrynguyen
Calcite | Level 5
thanks, I canculated roaq myself, so it means if there is a missing record for Q3, then there is no roaq for Q3. In contrast, if there is a record for Q3 but some variables used to calculate roaq are missing and roaq has a missing value. Basically, if a record is missing, then I dont have roaq.
mkeintz
PROC Star

OK, so an entire quarterly record can be skipped.  In that case, I'd suggest the following:

 

 

data v_have_template / view=v_have_template;
  set have (keep=gvkey fyearq);
  by gvkey fyearq;
  if first.fyearq then do fqtr=1 to 4;
    output;
  end;
run;

data want;
  merge have v_have_template;
  by gvkey fyearq fqtr;

  array _roa{0:19} _temporary_;  /* Rolling 20 qtrs of data */

  if first.gvkey then call missing(of _roa{*});

  _roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */

  if fqtr=4;   /* Process only Q4 records */

  n_roaq=n(of _roa{*});  /* Count non-missing roaq values */
  if n_roaq >=16 then std_roaq=std(of _roa{*});  /* Editted: changed 4 to 16 */
run;

 

 

Because there can be holes in HAVE (entire records missing for a given gvkey/fyearq/fqtr), this program creates the data set V_TEMPLATE_HAVE, which is nothing more than a dummy record for every fyearq/fqtr in a gvkey's entire time span.  I.e. if a given fyearq/fqtr is missing in have, it is nevertheless present in v_have_templare.

 

The second data step merges have with have_template.  Any case in which a gvkey/fyearq/fqtr is present in have_template but absent in have will result in a record with valid gvkey and date values, but missing values for roaq, cfq, etc.

 

Note that v_template_have is a data set VIEW, not a data set FILE.  That is, it is not activated until v_template_have is called for later in the program.  As a result its observations are streaming to the calling process, not written to disk.  Same data results, but more efficient due to disk input/output reduction.

 

Also in the second data step:

  1. The array roaq, with 20 elements indexed as element 0 through element 19, is a _temporary_ array, meaning its values are retained from observation to observation, but those values are not automatically output to data set want.

  2. The mathematical expression 4*fyearq+fqtr is a positive integer, and will be consecutive integers for all the records coming from v_template_ave.  The mod(...,20) function gives the remainder of that expression after division by 20.  I.e. it gives a value from 0 through 19.  So the corresponding element of array _roa is given the current roaq value.  Each new value replaces the value exactly 20 quarters prior, to the array always has the most recent 20 quarters.

  3. Then it's just a matter of seeing whether array _roa has at least 16 elements.

If you want to do the same for CFQ, just define another array
    _cfq{0:19} _temporary)

and process is analogously to _roa.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Thierrynguyen
Calcite | Level 5
Hi, Thank you very much for the code and your detailed explanation. I tried to run the code, but there was an error message:

53144 data std;
53145 merge fundaq v_have_template;
53146 by gvkey fyearq fqtr;
53147 array _roa{0:19} _temporary_; /* Rolling 20 qtrs of data */
53148 if first.gvkey then call missing(of _roa{*});
ERROR: The ARRAYNAME[*] specification requires a variable based array.
53149 _roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */
53150 if fqtr=4; /* Process only Q4 records */
53151 n_roaq=n(of _roa{*}); /* Count non-missing roaq values */
ERROR: The ARRAYNAME[*] specification requires a variable based array.
53152 if n_roaq >=16 then std_roaq=std(of _roa{*}); /* Editted: changed 4
---
71
53152! to 16 */
ERROR: The ARRAYNAME[*] specification requires a variable based array.
ERROR 71-185: The STD function call does not have enough arguments.

Do you have any idea? Thanks, Thierry
mkeintz
PROC Star

I am using   SAS 9.4 TS1M3 and do not get that error, so I assume you are using an older version.

 

You can change the temporary array to a normal array of variables, and retain those variables:

 

data v_have_template / view=v_have_template;
  set have (keep=gvkey fyearq);
  by gvkey fyearq;
  if first.fyearq then do fqtr=1 to 4;
    output;
  end;
run;

data want (drop=_:);
  merge have v_have_template;
  by gvkey fyearq fqtr;

  array _roa{0:19} _roa0-_roa19;  /* Rolling 20 qtrs of data */
  retain _roa: ;
  if first.gvkey then call missing(of _roa{*});

  _roa{mod(4*fyearq+fqtr,20)}=roaq; /* Populate array */

  if fqtr=4;   /* Process only Q4 records */

  n_roaq=n(of _roa{*});  /* Count non-missing roaq values */
  if n_roaq >=16 then std_roaq=std(of _roa{*});  /* Editted: changed 4 to 16 */
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Thierrynguyen
Calcite | Level 5

Thank you very much! It is helpful. Cheers, Thierry

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 20 replies
  • 2040 views
  • 2 likes
  • 4 in conversation