04-24-2012 10:25 AM
As a SAS layman, you will have a lot to learn to answer even the basic questions that you are asking. I will get you started, but can't possibly present all the tools you will need.
First, I have to assume you already have a SAS data set although you don't explicitly name the data set or any of the variables within it. Here are some tools that will serve you well in general, and will start the ball rolling.
First, create a SAS data set holding a count of how many observations exist for each FIRM:
proc freq data=full_dataset;
tables firm / noprint out=firm_counts (keep=firm count);
Next, merge this back onto the original data, and subset the observations:
proc sort data=full_dataset;
merge full_dataset firm_counts;
if count >= 40;
This will generate another SAS data set holding just the FIRMs with at least 40 observations. That seems to be part of what you asked for. There is a lot more to it, as noted below.
If you have data from 1970 through 2010, you might have 41 observations rather than 40 for some firms.
Just because there are 40 observations, for a firm, it doesn't mean you have consecutive years. There could be 40 observations all for the same firm for the same year.
Just because you have 40 observations, it doesn't mean they all contain valid data. There could be blanks in many of the variables.
It will take a lot of work on your part to be able to address all of these questions. I'm hoping this will get you started.