04-23-2016 11:09 AM
04-23-2016 11:29 AM
Hello Dear Cynthia,
I am a beginner and undergraduate student who is trying to replicate a finance paper individually. First obstacle was i needed to learn sas by myself, and still learning. as far as i thought, cusip is an unique identifier which represents a firm or company.
this is a sample of my data set: from here, i want to get number of firms by cusip for each year. how may firms in a year. of course, i was not doing anything and i had tried to use count function;
select count(distinct cusip) as count_cusip, count(distinct names date) as count_names date from tmp1.msenames;
i have attached a sample data set.
thanks for you reply and feedback,
04-23-2016 11:50 AM
If you're just getting started, you may want to consider SAS UE. It has tasks and code snippets that make it easier to figure out how to do something. You're also trying to learn SAS and SQL at the same time which is just a touch more difficult, but still a good idea.
Regardless of your level, as Cynthia has mentioned, learning how to ask a good question, in programming and generally is an important skill.
Your code generates errors, did you try and resolve those? I don't download excel files, so I'm replicating your example using the SASHELP.STOCKS dataset, which is a SAS sample dataset available to everyone.
56 proc sql; 57 select count(distinct stock) as count_stock, 58 count(distinct stock date) as count_names date from sashelp.stocks; ____ 22 76 ERROR 22-322: Syntax error, expecting one of the following: !, !!, &, *, **, +, -, '.', /, <, <=, <>, =, >, >=, ?, AND, CONTAINS, EQ, EQT, GE, GET, GT, GTT, LE, LET, LIKE, LT, LTT, NE, NET, OR, ^=, |, ||, ~=. ERROR 76-322: Syntax error, statement will be ignored.
The error here is telling you that you have too many arguments in your count() function. You can only have one variable listed.
Starting small, here's how to count each distinct stock in the file:
proc sql; create table count_stocks as select count(distinct stock) as count_stock from sashelp.stocks; quit;
Then if you want stocks by date, you can add date to the SELECT and a GROUP statement, since you're doing analysis by groups.
proc sql; create table count_stocks_by_date as select date, count(distinct stock) as count_stock from sashelp.stocks group by date; quit;
04-23-2016 12:16 PM
Thanks for you advice, i will keep in mind. I tried to implement your simple code in my sas. but it does not work. what is the problem? too many arguments?
create table count_stocks_by_date as
select date, count(distinct cusip number) as count_stock
group by effective date;
this is what i tried to do.
04-23-2016 12:39 PM
When programming, you have to use the same names for variables/columns that SAS recognizes. In you code, you referred to the same value as "date" and "effective_date" and used the phrase "cusip number" instead of "cusip". SAS should have given you grief in the log. Did you look at the log? SAS will tell you what problems it has with your code. Though, keep in mind that that can sometimes be a symptom, not the main cause, of the problem.
04-23-2016 12:41 PM
And, if you are going to use SASHELP.STOCKS, there are ways to count without using PROC SQL. For example:
proc freq data=sashelp.stocks; table stock / nocum ; table date*stock / list; format date year4.; run; proc tabulate data=sashelp.stocks; class date stock; table date, stock; format date year4.; run; proc report data=sashelp.stocks; column date stock; define date / group f=year4.; define stock / across; run; proc means data=sashelp.stocks mean n nonobs; class date stock; var high; format date year4.; run;
In the PROC MEANS example, the MEAN of the variable HIGH will be calculated, as an additional example of the type of processing you can get without using SQL.
04-23-2016 12:22 PM
Having looked at you dataset (just called "data" below), I think you might want to count the number of firms/cusip by year. If so, you need to get the year from the effect_date and group by that value named "eff_year" below.
select year(effect_date) as eff_year,
count(distinct cusip) as cusip_cnt
group by eff_year;