12-21-2016 06:20 AM
Hi all, I am trying to calculate monthly standard deviation of returns dataset. I have tried the following:
proc sort data=have;
by NCUSIP date year month;
create table want as
select *, std(RET) as Result
group by NCUSIP,date,year,month;
For some reason, this does not give me the result. Can someone kindly suggest what to do?
12-21-2016 06:33 AM
Sorry, what does "this does not give me the result" mean? Does it not give you a dataset, is the value missing, is the value not what you were expecting, are there errors/warnings etc. Post test data - in the form of a datastep, and what the output should like. Also post the log if there are error/warnings (as text in the post).
12-21-2016 06:37 AM
12-21-2016 06:46 AM - edited 12-21-2016 06:48 AM
Edit, scrub that. @Ksharp might have it. I had assumed you knew what you were doing with the *. select * means select all variables in the dataset - which will affect the summary statistic. It is never a good idea to use select *, its just lazy programming.
Sorry, again, I don't know "calculate monthly SD as I wanted" what you wanted? The std() function returns per the documentation?
Perhaps your looking for something different, but there are no examples given to advise. What does a proc means give you from the data:
proc means data=have; by ncusip date year month; var ret; output out=want std=std; run;
Note consistent casing and indentations!
12-21-2016 07:06 AM
If you want a single value (std) for each group . You only need include GROUP variable and that summary statistic STD. Or you could try RW9's proc means code, both yield the same result.
12-21-2016 10:39 AM
"Group by" means exactly that, treat the combination of those variables as a single group and do the summary statistic within that group. When you include Date, assuming is a single day, then you may as well forget Year and Month as the finest level of summary will occur at date level.
If you are desiring separate summaries for the Year, then month (likely not a good idea unless you really want Jan 2015 and 2016 combined) then you may want something different. I would tend to look towards proc summary/means with NCUSIP, Year and Date as Class variables but use a Format for date of MMYY for date (assuming it is an actual SAS date value). That by default would generate summaries for the variable across all records (_type_=0) NCUSIP across all the data, year across all the data (NCUSIPs) and MMYY across all the NCUSIP, and then combinations of them, the _type_ variable indicating which combinations. You could generate a CLASSDATA set to specify which combinations of interest you want.
It may be that you actually only want NCUSIP and Date with the appropriate format.
Need further help from the community? Please ask a new question.