BookmarkSubscribeRSS Feed
lucas_balaminut
Calcite | Level 5

Hello! It is my first question on this forum, thanks for the patience and help.

 

I have a panel data for stock returns with the following variables: firm, year, day, return, and number of trades. I'm trying to remove observations based on on the number of trades variable. I want to check if firms have at least 30 days of activity within an year (at least 30 non-blank and non-zero number of trades data points for each firm-year combination). If the firm-year does not pass the check, meaning it does not have at least 30 non-blank non-zero number of trades, I want to remove all the observations for that entire firm-year.

 

I have been trying to write a DO loop for this, but to be honest I'm not very experienced and I could not accomplish this task.

 

Extra context: I'm using Compustat (compd.funda) merged with CRSP (crspa.dsf) through a link table (crsp.ccmxpf_linktable) to attempt to recreate a measure from Chen, Goldstein and Jiang, 2007 called R^2. This measure is from a regression of firm returns on market returns and industry returns. The authors filter the data by removing firm-year observations with less than 30 days of trading activities in a year, and I'm struggling to replicate this measure.

 

Thanks,

 

Lucas Balaminut

 

3 REPLIES 3
Astounding
PROC Star

If your data set is already sorted, you could skip the PROC SORT:

 

proc sort data=have;
   by firm year day;
run;

data want;
   n = 0;
   do until (last.year);
      set have;
      by firm year;
      if number_of_trades > 0 then n + 1;
   end;
   do until (last.year);
      set have;
      by firm year;
      if n >= 30 then output;
   end;
   drop n;
run;

The top loop counts the number of "greater than zero" observations for the current FIRM / YEAR.

 

The bottom loop re-reads the same observations, and outputs if the top loop found enough observations.

lucas_balaminut
Calcite | Level 5

Thanks so much for the help! 

 

But somehow that is still not working. Before running the do loops, I have observations at the daily level. After running the code, I'm left with observations at the year level. Somehow the do loops are deleting all the observations that are not the very last day of a firm-year, even for firms that do have more than 30 days with number_of_trades > 0. I need the daily observations to be kept because I will run a regression on daily returns later.

 

So, to recap, what I have are variables for firm, year, and number of trades. I need to somehow identify which firm-years have less than 30 days meeting the criteria of number_of_trades > 0, then remove the firm-years not meeting this criteria, but keep all the daily observations for the firm-years that do meet the criteria.

 

Do you think you can help?

 

Thanks again!

 

Lucas B.

Astounding
PROC Star

The logic looks correct.  I would recommend:

 

  • Confirm that your original data set does not contain a variable named N
  • Re-run the code, and post the log

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1285 views
  • 0 likes
  • 2 in conversation