Hello everyone,
I have dataset which has gvkey, year, and sale. I need to calculate standard deviation pf sale for each firm (identified by gvkey) between year of t to t-4.
For example standard deviation of sale for 1996 from 1996 to 1995, 1994, 1993, 1992.
I appreciate your help.
gvkey | year | sale |
1004 | 1992 | 524 |
1004 | 1993 | 765 |
1004 | 1994 | 442 |
1004 | 1995 | 510 |
1004 | 1996 | 324 |
…. | …. | … |
1019 | 1990 | 2900 |
1019 | 1991 | 3600 |
1019 | 1992 | 5624 |
1019 | 1993 | 4527 |
1019 | 1994 | 6570 |
1019 | 1995 | 4298 |
1019 | 1996 | 4657 |
1019 | 1997 | 5498 |
1019 | 1998 | 6790 |
…. | … | … |
Hello,
You need a moving standard deviation with size of the window equal to 5 (t-4, t-3, t-2, t-1, t).
You can easily do this with PROC EXPAND (SAS/ETS).
Here are some transformations that will be of interest to you :
Try for example this:
Take care! : I am not sure if this will calculate std on (t-4, t-3, t-2, t-1, t) or on (t-5, t-4, t-3, t-2, t-1).
Please check ... if it's not including the current line t (as you want) , I will tell you how to "shift" the window.
proc expand data=sashelp.pricedata out=abc;
by region line product;
convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
run;
proc datasets library=work NoList Nodetails memtype=DATA;
modify abc;
label sale_MOVSTD = "sale moving std window size = 5";
run;
QUIT;
BR, Koen
You can actually assign the label during the PROC EXPAND, thereby avoiding the need for a subsequent PROC DATASETS.
The price? An unavoidable warning note, as in:
31 proc expand data=sashelp.pricedata out=abc;
32 by region line product;
33 convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
34 label sale_MOVSTD = "sale moving std window size = 5";
WARNING: Variable SALE_MOVSTD not found in data set SASHELP.PRICEDATA.
35 run;
data want;
array p{0:4} _temporary_;
set have;
by gvkey year;
if first.gvkey then do;
call missing(of p{*});
num_years=0;
end;
num_years+1;
p{mod(_n_,5)} = sale;
*if you only want to calculate this once you have 5 years of data;
if num_years>=5 then sale_std = std(of p(*));
*if not conditional remove the IF portion;
run;
I would probably recommend PROC EXPAND as it will deal with missing data and years better, however, here's a rough data step approach. You will need to tailor to deal with missing years/data if necessary.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.