Hello everyone,
I have dataset which has gvkey, year, and sale. I need to calculate standard deviation pf sale for each firm (identified by gvkey) between year of t to t-4.
For example standard deviation of sale for 1996 from 1996 to 1995, 1994, 1993, 1992.
I appreciate your help.
gvkey | year | sale |
1004 | 1992 | 524 |
1004 | 1993 | 765 |
1004 | 1994 | 442 |
1004 | 1995 | 510 |
1004 | 1996 | 324 |
…. | …. | … |
1019 | 1990 | 2900 |
1019 | 1991 | 3600 |
1019 | 1992 | 5624 |
1019 | 1993 | 4527 |
1019 | 1994 | 6570 |
1019 | 1995 | 4298 |
1019 | 1996 | 4657 |
1019 | 1997 | 5498 |
1019 | 1998 | 6790 |
…. | … | … |
Hello,
You need a moving standard deviation with size of the window equal to 5 (t-4, t-3, t-2, t-1, t).
You can easily do this with PROC EXPAND (SAS/ETS).
Here are some transformations that will be of interest to you :
Try for example this:
Take care! : I am not sure if this will calculate std on (t-4, t-3, t-2, t-1, t) or on (t-5, t-4, t-3, t-2, t-1).
Please check ... if it's not including the current line t (as you want) , I will tell you how to "shift" the window.
proc expand data=sashelp.pricedata out=abc;
by region line product;
convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
run;
proc datasets library=work NoList Nodetails memtype=DATA;
modify abc;
label sale_MOVSTD = "sale moving std window size = 5";
run;
QUIT;
BR, Koen
You can actually assign the label during the PROC EXPAND, thereby avoiding the need for a subsequent PROC DATASETS.
The price? An unavoidable warning note, as in:
31 proc expand data=sashelp.pricedata out=abc;
32 by region line product;
33 convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
34 label sale_MOVSTD = "sale moving std window size = 5";
WARNING: Variable SALE_MOVSTD not found in data set SASHELP.PRICEDATA.
35 run;
data want;
array p{0:4} _temporary_;
set have;
by gvkey year;
if first.gvkey then do;
call missing(of p{*});
num_years=0;
end;
num_years+1;
p{mod(_n_,5)} = sale;
*if you only want to calculate this once you have 5 years of data;
if num_years>=5 then sale_std = std(of p(*));
*if not conditional remove the IF portion;
run;
I would probably recommend PROC EXPAND as it will deal with missing data and years better, however, here's a rough data step approach. You will need to tailor to deal with missing years/data if necessary.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.