Hello!
Simpl question.
I want to obtain standard deviations of previous five-year sales. In other words, I need to calculate the std from t-1 to t-5.
As long as I know, the code below means the period between t and t-4. I have tried to find an exact code for this command but could not find but assume I may use other than Proc expand.
Could anyone help me with how to write the code for t-1 to t-n period?
Thank you so much!
PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD 5);
by gvkey;
RUN;
Sorry, the Std Function calculates the sample std and it seems you want the population std.
Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);
    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;
run;
Result:
std gvkey fyear sale 0.000 1 2010 100 43.301 1 2011 200 82.916 1 2012 300 111.803 1 2013 400 111.803 1 2014 500 0.000 2 2010 100 43.301 2 2011 200 82.916 2 2012 300 111.803 2 2013 400 111.803 2 2014 500
How about
data comp1;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 100
2 2014 200
;
PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD (1 1 1 1 0));
by gvkey;
RUN;Thank you for your reply.
It seems that using
TRANSFORMOUT=(MOVSTD (1 1 1 1 0))
means x multiplied by(X) 0, x-1 X 1, and so on to x-4 X 1... which actually results in the period from t-1 to t-4 while the denominator is still 5. I just tried to test with the code below.
data work;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 200
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 200
2 2014 200
;
What I obtained from it are just weird values.
The standard deviation of 21.65 looks out of nowhere. I can guess the variance there is sth like 468.7 so if I multiply 4 with this variance, then I get close to 1875, which is the nominator of the definition of STD, though I have no clue where this number comes from.
my calculations are... (As the most recent observations are not used)
mean= (100+100+100+200)/4 = 175
variance = {(100-175)^2 +(100-175)^2+(100-175)^2+(200-175)^2} / 4= 4375 (not like 1875)
Std = 66.1438
This looks not quite fit with what I seek now standard deviations from t-5 to t-1, which conceptually needs observations at the t-1 period as well.
Any help?
I think you're right. Is it a requirement to use Proc Expand?
Ok. I'm sure Proc Expand can handle this somehow, but I'm no expert.
Here is a data step approach that gives you what you want:
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs);
    array x[0:3] _temporary_;
    call missing(of x[*]);
    do obs=1 by 1 until (last.gvkey);    
        std = std(of x[*]);
        set have;
        by gvkey;
        x[mod(obs, 4)]=sale;
        output; 
    end;
   
run;Sorry, the Std Function calculates the sample std and it seems you want the population std.
Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂
data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;
data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);
    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;
run;
Result:
std gvkey fyear sale 0.000 1 2010 100 43.301 1 2011 200 82.916 1 2012 300 111.803 1 2013 400 111.803 1 2014 500 0.000 2 2010 100 43.301 2 2011 200 82.916 2 2012 300 111.803 2 2013 400 111.803 2 2014 500
Btw, here mean = (100 + 100 + 100 + 200) / 4 = 125 😉
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
