BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I've been stumped for days!! I'm a beginner, so here goes...

I have a set of data on stocks. One variable is the stock ticker symbol and another is fiscal year. I've downloaded 21 years worth of data (some 400,000 observations). There are some stocks who have 1 fiscal year listed, some two, some all 21.

I want to sort out the stocks that are listed for all 21 years from the others.

Any ideas? I appreciate any help anyone could offer.
2 REPLIES 2
Cynthia_sas
SAS Super FREQ
Hi:
There are several things you could do ...from the totally manual to the totally automated.

Some "diagnostic" tools are easily available to you. PROC FREQ, for example. Let's say I run this code for SASHELP.CLASS -- the most famous of SAS datasets.
[pre]
proc freq data=sashelp.class;
tables sex age;
run;
[/pre]

Then I can see that there are 9 female and 10 male students/observations in the file. The age list shows me that I have 5 12 year olds and 1 16 year old.

You could begin your data exploration with a simple PROC FREQ. Let's say your ticket symbol variable was called TICKER and your fiscal year variable was called YEAR, then a simple PROC FREQ like this:
[pre]
proc freq data=stock_data order=freq;
tables ticker year;
run;
[/pre]

When you look at the PROC FREQ output, since I added the ORDER=FREQ option, you'll see a list of TICKER symbols and the frequency count will be the number of observations for that particular symbol. The ticker symbols with the highest number of rows will be listed first.

The easiest manual way to get only the TICKER with 21 rows into a separate file would be to code a PROC SORT using information from the PROC FREQ. Let's say, that you find KRMT, MSPG, ELMO, BBRD and OSCR symbols all have 21 observations (assuming that you have 1 obs per year, this means that they have 21 years of data). Your PROC SORT would look like this:
[pre]
proc sort data=stock_data out=only21;
by TICKER YEAR;
where TICKER in ('KRMT', 'MSPG', 'ELMO', 'BBRD', 'OSCR');
run;
[/pre]

This PROC SORT does 3 things. It uses your original file, STOCK_DATA as input to the SORT, sorts it by TICKER and YEAR, AND it creates an output data set, ONLY21 -- which is a subset of data rows based on the WHERE Statement. With the IN operator, the values in the parentheses are treated like they're a bunch of ORs:
[pre]
WHERE TICKER = 'KRMT' or TICKER='ELMO';
[/pre]

Now, that whole process could be automated with an SQL query or with SAS macro variables. But I don't want to overload you with details on PROC SQL or more advanced stuff. For a beginner, the above manual solution should get you started and get you a subset of data that you can start to work with.

cynthia
1162
Calcite | Level 5
If you do want to try a SQL query (I find SQL is easier than people often think) this could work.

This is based on a SAS dataset called work.complete_data that has the columns Symbol, Year, and Etc (could be price or whatever).

[pre]
proc sql;
create table work.complete_data as
select Symbol, Year, Etc
from work.stock_data
group by symbol
having count(Price) = 21;
quit;
[/pre]

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 1217 views
  • 0 likes
  • 3 in conversation