I've been stumped for days!! I'm a beginner, so here goes...
I have a set of data on stocks. One variable is the stock ticker symbol and another is fiscal year. I've downloaded 21 years worth of data (some 400,000 observations). There are some stocks who have 1 fiscal year listed, some two, some all 21.
I want to sort out the stocks that are listed for all 21 years from the others.
Any ideas? I appreciate any help anyone could offer.
There are several things you could do ...from the totally manual to the totally automated.
Some "diagnostic" tools are easily available to you. PROC FREQ, for example. Let's say I run this code for SASHELP.CLASS -- the most famous of SAS datasets.
proc freq data=sashelp.class;
tables sex age;
Then I can see that there are 9 female and 10 male students/observations in the file. The age list shows me that I have 5 12 year olds and 1 16 year old.
You could begin your data exploration with a simple PROC FREQ. Let's say your ticket symbol variable was called TICKER and your fiscal year variable was called YEAR, then a simple PROC FREQ like this:
proc freq data=stock_data order=freq;
tables ticker year;
When you look at the PROC FREQ output, since I added the ORDER=FREQ option, you'll see a list of TICKER symbols and the frequency count will be the number of observations for that particular symbol. The ticker symbols with the highest number of rows will be listed first.
The easiest manual way to get only the TICKER with 21 rows into a separate file would be to code a PROC SORT using information from the PROC FREQ. Let's say, that you find KRMT, MSPG, ELMO, BBRD and OSCR symbols all have 21 observations (assuming that you have 1 obs per year, this means that they have 21 years of data). Your PROC SORT would look like this:
proc sort data=stock_data out=only21;
by TICKER YEAR;
where TICKER in ('KRMT', 'MSPG', 'ELMO', 'BBRD', 'OSCR');
This PROC SORT does 3 things. It uses your original file, STOCK_DATA as input to the SORT, sorts it by TICKER and YEAR, AND it creates an output data set, ONLY21 -- which is a subset of data rows based on the WHERE Statement. With the IN operator, the values in the parentheses are treated like they're a bunch of ORs:
WHERE TICKER = 'KRMT' or TICKER='ELMO';
Now, that whole process could be automated with an SQL query or with SAS macro variables. But I don't want to overload you with details on PROC SQL or more advanced stuff. For a beginner, the above manual solution should get you started and get you a subset of data that you can start to work with.