Help using Base SAS procedures

Computing Standard Deviation Across Multiple Time Periods

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Computing Standard Deviation Across Multiple Time Periods

Hello,

I am working with stock market price data. My columns are date, company_number (co_num), and price (prc).

I need to compute the standard deviation of price for each company across a one, two, three, four, and five year range.

I can do the single year using the following in Base SAS 9.4:

Proc sql;

Create table new as

Select co_num, year(date) as year, std(prc) as std_1_year

From old

Group by co_num, year(date);

Quit;

How do I do the additional years? Thank you for the assistance!

Attachment

Accepted Solutions
Solution
‎12-17-2014 05:33 PM
Respected Advisor
Posts: 3,124

Re: Computing Standard Deviation Across Multiple Time Periods

Not sure if this is what you need, but one way to start the conversation:

libname test "\\xxxxx\temp";

data _o_1;

     set test.old;

     year=year(date);

     drop date;

run;

proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;

     by co_num year;

run;

proc sql;

     create table want as

           select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

           from _o_2 a

     ;

quit;

View solution in original post


All Replies
Super User
Posts: 17,813

Re: Computing Standard Deviation Across Multiple Time Periods

Do you have SAS ETS licences?

If so look into proc expand.

Occasional Contributor
Posts: 7

Re: Computing Standard Deviation Across Multiple Time Periods

Reeza,

Thank you for the response. I indeed have access to the expand procedure. I am not sure how to use it in this case. I don't really want to collapse my existing time series. I want to be able to group by more than one year. The problem I encounter is that my multi-year periods overlap. For example 2001 is part of the two year sequence of 2000-2001 and 2001-2002.

Trusted Advisor
Posts: 1,204

Re: Computing Standard Deviation Across Multiple Time Periods

Please provide some sample data. In your example how this reflects standard deviation based on single year only?

Occasional Contributor
Posts: 7

Re: Computing Standard Deviation Across Multiple Time Periods

I added some sample data to the original post.

Occasional Contributor
Posts: 7

Re: Computing Standard Deviation Across Multiple Time Periods

I also usually add a unique statement after the select to get just one line for company, year, and standard deviation.

Super User
Posts: 17,813

Re: Computing Standard Deviation Across Multiple Time Periods

Here's a way using a multi label format and proc means. More code and probably a way to simplify but for starting out it will probably work:

EDIT: Needed to change the loop boundaries from 1 to 5 to 0 to 4 to get proper year counts!

*Get all the years in your dataset;

proc sql;

  create table years as

  select distinct year(date) as year, max(calculated year) as max_year

  from old;

quit;

*Create a format that does the grouping you want;

data year_fmt;

length hlo $2. label $25.;

  set years;

  fmtname='year_fmt';

  hlo='SM';

  start=year;

  do i=0 to 4;

  end=year+i;

  label=catx("-", put(start, 4. -l), put(end, 4. -l));

  if end<=max_year then output;

  end;

  drop i year;

run;

proc format cntlin=year_fmt;

run;

*Add year variable into data set;

data old2;

  set old;

  year=year(date);

run;

*calculate STD for all groups at once;

proc means data=old2 noprint;

class co_num year/mlf;

format year year_fmt.;

ways 2;

var prc;

output out=want std(prc)=std_want;

run;

Solution
‎12-17-2014 05:33 PM
Respected Advisor
Posts: 3,124

Re: Computing Standard Deviation Across Multiple Time Periods

Not sure if this is what you need, but one way to start the conversation:

libname test "\\xxxxx\temp";

data _o_1;

     set test.old;

     year=year(date);

     drop date;

run;

proc sort data=_o_1 out=_o_2 (keep=co_num year) nodupkey;

     by co_num year;

run;

proc sql;

     create table want as

           select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

                ,(select std(prc) from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

           from _o_2 a

     ;

quit;

Occasional Contributor
Posts: 7

Re: Computing Standard Deviation Across Multiple Time Periods

Thank you, this does it. I had to add the following statement:

data want;

set want;

if ste_2_year=ste_1_year then ste_2_year=.;

if ste_3_year=ste_2_year then ste_3_year=.;

if ste_4_year=ste_3_year then ste_4_year=.;

if ste_5_year=ste_4_year then ste_5_year=.;

if ste_1_year=. then ste_2_year=.;

if ste_2_year=. then ste_3_year=.;

if ste_3_year=. then ste_4_year=.;

if ste_4_year=. then ste_5_year=.;

run;

This keeps data that doesn't have the required length of times from returning identical values for 1 year and 2 year sd's.

Respected Advisor
Posts: 3,124

Re: Computing Standard Deviation Across Multiple Time Periods

In that case, you can also add some logic and have it done in the step of Proc SQL:

proc sql;

  create table want as

  select *, (select std(prc) from _o_1 where co_num=a.co_num and year=a.year) as ste_1_year

  ,(select case when count (distinct year) >1 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-1 <= year<=a.year) as ste_2_year

  ,(select case when count (distinct year) >2 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-2 <= year<=a.year) as ste_3_year

  ,(select case when count (distinct year) >3 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-3 <= year<=a.year) as ste_4_year

  ,(select case when count (distinct year) >4 then std(prc) else . end from _o_1 where co_num=a.co_num and a.year-4 <= year<=a.year) as ste_5_year

  from _o_2 a

  ;

quit;

Good Luck,

Haikuo

BTW: Your data step used for clean up is not robust enough. What if you do have identical Stds across multiple years?

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 607 views
  • 1 like
  • 4 in conversation