Hello everyone,
I have dataset which has gvkey, year, and sale. I need to calculate standard deviation pf sale for each firm (identified by gvkey) between year of t to t-4.
For example standard deviation of sale for 1996 from 1996 to 1995, 1994, 1993, 1992.
I appreciate your help.
gvkey | year | sale |
1004 | 1992 | 524 |
1004 | 1993 | 765 |
1004 | 1994 | 442 |
1004 | 1995 | 510 |
1004 | 1996 | 324 |
…. | …. | … |
1019 | 1990 | 2900 |
1019 | 1991 | 3600 |
1019 | 1992 | 5624 |
1019 | 1993 | 4527 |
1019 | 1994 | 6570 |
1019 | 1995 | 4298 |
1019 | 1996 | 4657 |
1019 | 1997 | 5498 |
1019 | 1998 | 6790 |
…. | … | … |
Hello,
You need a moving standard deviation with size of the window equal to 5 (t-4, t-3, t-2, t-1, t).
You can easily do this with PROC EXPAND (SAS/ETS).
Here are some transformations that will be of interest to you :
Try for example this:
Take care! : I am not sure if this will calculate std on (t-4, t-3, t-2, t-1, t) or on (t-5, t-4, t-3, t-2, t-1).
Please check ... if it's not including the current line t (as you want) , I will tell you how to "shift" the window.
proc expand data=sashelp.pricedata out=abc;
by region line product;
convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
run;
proc datasets library=work NoList Nodetails memtype=DATA;
modify abc;
label sale_MOVSTD = "sale moving std window size = 5";
run;
QUIT;
BR, Koen
You can actually assign the label during the PROC EXPAND, thereby avoiding the need for a subsequent PROC DATASETS.
The price? An unavoidable warning note, as in:
31 proc expand data=sashelp.pricedata out=abc;
32 by region line product;
33 convert sale = sale_MOVSTD / transformout=(MOVSTD 5);
34 label sale_MOVSTD = "sale moving std window size = 5";
WARNING: Variable SALE_MOVSTD not found in data set SASHELP.PRICEDATA.
35 run;
data want;
array p{0:4} _temporary_;
set have;
by gvkey year;
if first.gvkey then do;
call missing(of p{*});
num_years=0;
end;
num_years+1;
p{mod(_n_,5)} = sale;
*if you only want to calculate this once you have 5 years of data;
if num_years>=5 then sale_std = std(of p(*));
*if not conditional remove the IF portion;
run;
I would probably recommend PROC EXPAND as it will deal with missing data and years better, however, here's a rough data step approach. You will need to tailor to deal with missing years/data if necessary.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.