Solved: Re: Computing Standard Deviation Across Multiple Time Periods

cemccoy · Posted 12-17-2014 04:31 PM

Hello,

I am working with stock market price data. My columns are date, company_number (co_num), and price (prc).

I need to compute the standard deviation of price for each company across a one, two, three, four, and five year range.

I can do the single year using the following in Base SAS 9.4:

Proc sql;

Create table new as

Select co_num, year(date) as year, std(prc) as std_1_year

From old

Group by co_num, year(date);

Quit;

How do I do the additional years? Thank you for the assistance!

Haikuo · Posted 12-17-2014 05:33 PM

Not sure if this is what you need, but one way to start the conversation:

libname test "\\xxxxx\temp";

data _o_1;

set test.old;

year=year(date);

drop date;

run;

proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;

by co_num year;

run;

proc sql;

create table want as

select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

from _o_2 a

;

quit;

View solution in original post

Reeza · Posted 12-17-2014 04:34 PM

Do you have SAS ETS licences?

If so look into proc expand.

cemccoy · Posted 12-17-2014 04:48 PM

Reeza,

Thank you for the response. I indeed have access to the expand procedure. I am not sure how to use it in this case. I don't really want to collapse my existing time series. I want to be able to group by more than one year. The problem I encounter is that my multi-year periods overlap. For example 2001 is part of the two year sequence of 2000-2001 and 2001-2002.

stat_sas · Posted 12-17-2014 04:51 PM

Please provide some sample data. In your example how this reflects standard deviation based on single year only?

cemccoy · Posted 12-17-2014 05:01 PM

I added some sample data to the original post.

cemccoy · Posted 12-17-2014 05:03 PM

I also usually add a unique statement after the select to get just one line for company, year, and standard deviation.

Reeza · Posted 12-17-2014 05:51 PM

Here's a way using a multi label format and proc means. More code and probably a way to simplify but for starting out it will probably work:

EDIT: Needed to change the loop boundaries from 1 to 5 to 0 to 4 to get proper year counts!

*Get all the years in your dataset;

proc sql;

create table years as

select distinct year(date) as year, max(calculated year) as max_year

from old;

quit;

*Create a format that does the grouping you want;

data year_fmt;

length hlo $2. label $25.;

set years;

fmtname='year_fmt';

hlo='SM';

start=year;

do i=0 to 4;

end=year+i;

label=catx("-", put(start, 4. -l), put(end, 4. -l));

if end<=max_year then output;

end;

drop i year;

run;

proc format cntlin=year_fmt;

run;

*Add year variable into data set;

data old2;

set old;

year=year(date);

run;

*calculate STD for all groups at once;

proc means data=old2 noprint;

class co_num year/mlf;

format year year_fmt.;

ways 2;

var prc;

output out=want std(prc)=std_want;

run;

Haikuo · Posted 12-17-2014 05:33 PM

Not sure if this is what you need, but one way to start the conversation:

libname test "\\xxxxx\temp";

data _o_1;

set test.old;

year=year(date);

drop date;

run;

proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;

by co_num year;

run;

proc sql;

create table want as

select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

from _o_2 a

;

quit;

cemccoy · Posted 12-18-2014 11:12 AM

Thank you, this does it. I had to add the following statement:

data want;

set want;

if ste_2_year=ste_1_year then ste_2_year=.;

if ste_3_year=ste_2_year then ste_3_year=.;

if ste_4_year=ste_3_year then ste_4_year=.;

if ste_5_year=ste_4_year then ste_5_year=.;

if ste_1_year=. then ste_2_year=.;

if ste_2_year=. then ste_3_year=.;

if ste_3_year=. then ste_4_year=.;

if ste_4_year=. then ste_5_year=.;

run;

This keeps data that doesn't have the required length of times from returning identical values for 1 year and 2 year sd's.

Haikuo · Posted 12-18-2014 11:45 AM

In that case, you can also add some logic and have it done in the step of Proc SQL:

proc sql;

create table want as

select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

,(select case when count (distinct year) >1 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

,(select case when count (distinct year) >2 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

,(select case when count (distinct year) >3 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

,(select case when count (distinct year) >4 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

from _o_2 a

;

quit;

Good Luck,

Haikuo

BTW: Your data step used for clean up is not robust enough. What if you do have identical Stds across multiple years?

Registration is open

SAS Training: Just a Click Away