Hello!
Simpl question.
I want to obtain standard deviations of previous five-year sales. In other words, I need to calculate the std from t-1 to t-5.
As long as I know, the code below means the period between t and t-4. I have tried to find an exact code for this command but could not find but assume I may use other than Proc expand.
Could anyone help me with how to write the code for t-1 to t-n period?
Thank you so much!
PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD 5);
by gvkey;
RUN;
Sorry, the Std Function calculates the sample std and it seems you want the population std.
Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs i);
array x[0:3] _temporary_;
array y[0:3] _temporary_;
call stdize('replace', 'mult=', 0, of x[*], _N_);
do obs=1 by 1 until (last.gvkey);
do i = 0 to dim(x)-1;
y [i] = (x [i] - mean(of x[*])) ** 2;
end;
std = sqrt(divide(sum(of y[*]), 4));
set have;
by gvkey;
x[mod(obs, 4)] = sale;
output;
end;
run;
Result:
std gvkey fyear sale 0.000 1 2010 100 43.301 1 2011 200 82.916 1 2012 300 111.803 1 2013 400 111.803 1 2014 500 0.000 2 2010 100 43.301 2 2011 200 82.916 2 2012 300 111.803 2 2013 400 111.803 2 2014 500
How about
data comp1;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 100
2 2014 200
;
PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD (1 1 1 1 0));
by gvkey;
RUN;
Thank you for your reply.
It seems that using
TRANSFORMOUT=(MOVSTD (1 1 1 1 0))
means x multiplied by(X) 0, x-1 X 1, and so on to x-4 X 1... which actually results in the period from t-1 to t-4 while the denominator is still 5. I just tried to test with the code below.
data work;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 200
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 200
2 2014 200
;
What I obtained from it are just weird values.
The standard deviation of 21.65 looks out of nowhere. I can guess the variance there is sth like 468.7 so if I multiply 4 with this variance, then I get close to 1875, which is the nominator of the definition of STD, though I have no clue where this number comes from.
my calculations are... (As the most recent observations are not used)
mean= (100+100+100+200)/4 = 175
variance = {(100-175)^2 +(100-175)^2+(100-175)^2+(200-175)^2} / 4= 4375 (not like 1875)
Std = 66.1438
This looks not quite fit with what I seek now standard deviations from t-5 to t-1, which conceptually needs observations at the t-1 period as well.
Any help?
I think you're right. Is it a requirement to use Proc Expand?
Ok. I'm sure Proc Expand can handle this somehow, but I'm no expert.
Here is a data step approach that gives you what you want:
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs);
array x[0:3] _temporary_;
call missing(of x[*]);
do obs=1 by 1 until (last.gvkey);
std = std(of x[*]);
set have;
by gvkey;
x[mod(obs, 4)]=sale;
output;
end;
run;
Sorry, the Std Function calculates the sample std and it seems you want the population std.
Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs i);
array x[0:3] _temporary_;
array y[0:3] _temporary_;
call stdize('replace', 'mult=', 0, of x[*], _N_);
do obs=1 by 1 until (last.gvkey);
do i = 0 to dim(x)-1;
y [i] = (x [i] - mean(of x[*])) ** 2;
end;
std = sqrt(divide(sum(of y[*]), 4));
set have;
by gvkey;
x[mod(obs, 4)] = sale;
output;
end;
run;
Result:
std gvkey fyear sale 0.000 1 2010 100 43.301 1 2011 200 82.916 1 2012 300 111.803 1 2013 400 111.803 1 2014 500 0.000 2 2010 100 43.301 2 2011 200 82.916 2 2012 300 111.803 2 2013 400 111.803 2 2014 500
Btw, here mean = (100 + 100 + 100 + 200) / 4 = 125 😉
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.