Hello,
I am working with stock market price data. My columns are date, company_number (co_num), and price (prc).
I need to compute the standard deviation of price for each company across a one, two, three, four, and five year range.
I can do the single year using the following in Base SAS 9.4:
Proc sql;
Create table new as
Select co_num, year(date) as year, std(prc) as std_1_year
From old
Group by co_num, year(date);
Quit;
How do I do the additional years? Thank you for the assistance!
Not sure if this is what you need, but one way to start the conversation:
libname test "\\xxxxx\temp";
data _o_1;
set test.old;
year=year(date);
drop date;
run;
proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;
by co_num year;
run;
proc sql;
create table want as
select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year
from _o_2 a
;
quit;
Do you have SAS ETS licences?
If so look into proc expand.
Reeza,
Thank you for the response. I indeed have access to the expand procedure. I am not sure how to use it in this case. I don't really want to collapse my existing time series. I want to be able to group by more than one year. The problem I encounter is that my multi-year periods overlap. For example 2001 is part of the two year sequence of 2000-2001 and 2001-2002.
Please provide some sample data. In your example how this reflects standard deviation based on single year only?
I added some sample data to the original post.
I also usually add a unique statement after the select to get just one line for company, year, and standard deviation.
Here's a way using a multi label format and proc means. More code and probably a way to simplify but for starting out it will probably work:
EDIT: Needed to change the loop boundaries from 1 to 5 to 0 to 4 to get proper year counts!
*Get all the years in your dataset;
proc sql;
create table years as
select distinct year(date) as year, max(calculated year) as max_year
from old;
quit;
*Create a format that does the grouping you want;
data year_fmt;
length hlo $2. label $25.;
set years;
fmtname='year_fmt';
hlo='SM';
start=year;
do i=0 to 4;
end=year+i;
label=catx("-", put(start, 4. -l), put(end, 4. -l));
if end<=max_year then output;
end;
drop i year;
run;
proc format cntlin=year_fmt;
run;
*Add year variable into data set;
data old2;
set old;
year=year(date);
run;
*calculate STD for all groups at once;
proc means data=old2 noprint;
class co_num year/mlf;
format year year_fmt.;
ways 2;
var prc;
output out=want std(prc)=std_want;
run;
Not sure if this is what you need, but one way to start the conversation:
libname test "\\xxxxx\temp";
data _o_1;
set test.old;
year=year(date);
drop date;
run;
proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;
by co_num year;
run;
proc sql;
create table want as
select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year
,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year
from _o_2 a
;
quit;
Thank you, this does it. I had to add the following statement:
data want;
set want;
if ste_2_year=ste_1_year then ste_2_year=.;
if ste_3_year=ste_2_year then ste_3_year=.;
if ste_4_year=ste_3_year then ste_4_year=.;
if ste_5_year=ste_4_year then ste_5_year=.;
if ste_1_year=. then ste_2_year=.;
if ste_2_year=. then ste_3_year=.;
if ste_3_year=. then ste_4_year=.;
if ste_4_year=. then ste_5_year=.;
run;
This keeps data that doesn't have the required length of times from returning identical values for 1 year and 2 year sd's.
In that case, you can also add some logic and have it done in the step of Proc SQL:
proc sql;
create table want as
select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year
,(select case when count (distinct year) >1 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year
,(select case when count (distinct year) >2 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year
,(select case when count (distinct year) >3 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year
,(select case when count (distinct year) >4 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year
from _o_2 a
;
quit;
Good Luck,
Haikuo
BTW: Your data step used for clean up is not robust enough. What if you do have identical Stds across multiple years?
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.