BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9

Hello!

 

Simpl question. 

I want to obtain standard deviations of previous five-year sales. In other words, I need to calculate the std from t-1 to t-5.

As long as I know, the code below means the period between t and t-4. I have tried to find an exact code for this command but could not find but assume I may use other than Proc expand.

 

Could anyone help me with how to write the code for t-1 to t-n period? 

 

Thank you so much!

 

PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD 5);
by gvkey;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Sorry, the Std Function calculates the sample std and it seems you want the population std.

 

Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);

    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;

run;

 

Result:

 

std      gvkey  fyear  sale 
0.000    1      2010   100 
43.301   1      2011   200 
82.916   1      2012   300 
111.803  1      2013   400 
111.803  1      2014   500 
0.000    2      2010   100 
43.301   2      2011   200 
82.916   2      2012   300 
111.803  2      2013   400 
111.803  2      2014   500 

 

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

How about

 

data comp1;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 100
2 2014 200
;

PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD (1 1 1 1 0));
by gvkey;
RUN;
JKCho
Pyrite | Level 9

Thank you for your reply.

 

It seems that using 

 

TRANSFORMOUT=(MOVSTD (1 1 1 1 0))

 

means x multiplied by(X) 0, x-1 X 1, and so on to x-4 X 1...  which actually results in the period from t-1 to t-4 while the denominator is still 5. I just tried to test with the code below.

 

 

data work;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 200
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 200
2 2014 200
;

 

 

What I obtained from it are just weird values.

11.PNG

The standard deviation of 21.65 looks out of nowhere. I can guess the variance there is sth like 468.7 so if I multiply 4 with this variance, then I get close to 1875, which is the nominator of the definition of STD, though I have no clue where this number comes from.

 

my calculations are... (As the most recent observations are not used)

mean= (100+100+100+200)/4 = 175

variance = {(100-175)^2 +(100-175)^2+(100-175)^2+(200-175)^2} / 4= 4375 (not like 1875)

Std = 66.1438

 

This looks not quite fit with what I seek now standard deviations from t-5 to t-1, which conceptually needs observations at the t-1 period as well.

 

Any help?

 

PeterClemmensen
Tourmaline | Level 20

I think you're right. Is it a requirement to use Proc Expand?

JKCho
Pyrite | Level 9
Not at all, I am just working on replicating a journal paper to use their codes in my research as well. A variable in this paper is defined as standards deviations of the previous five years' sale. I thought PROC EXPAND can handle but facing this difficulty...
PeterClemmensen
Tourmaline | Level 20

Ok. I'm sure Proc Expand can handle this somehow, but I'm no expert.

 

Here is a data step approach that gives you what you want:

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs);
    array x[0:3] _temporary_;
    call missing(of x[*]);

    do obs=1 by 1 until (last.gvkey);    
        std = std(of x[*]);
        set have;
        by gvkey;
        x[mod(obs, 4)]=sale;
        output; 
    end;
   
run;
JKCho
Pyrite | Level 9
Thank you for your help.
but the output value by the above code does not provide correct standard deviations as well.
I could not understand "x[mod(obs, 4)]=sale;" part, and whether this code is forward-looking or backward is not sure.

I think I should use other than PROC EXPAND.

Thanks for your helps again 🙂
PeterClemmensen
Tourmaline | Level 20

Sorry, the Std Function calculates the sample std and it seems you want the population std.

 

Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);

    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;

run;

 

Result:

 

std      gvkey  fyear  sale 
0.000    1      2010   100 
43.301   1      2011   200 
82.916   1      2012   300 
111.803  1      2013   400 
111.803  1      2014   500 
0.000    2      2010   100 
43.301   2      2011   200 
82.916   2      2012   300 
111.803  2      2013   400 
111.803  2      2014   500 

 

JKCho
Pyrite | Level 9
Thank you for your ongoing efforts! I actually came up with easier programming but your code certainly works as well! Thank you so much!

Have a good day!
PeterClemmensen
Tourmaline | Level 20

Btw, here mean = (100 + 100 + 100 + 200) / 4 = 125 😉

JKCho
Pyrite | Level 9
lol... I made a huge but simple mistake! thank you for letting me know that 🙂
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2714 views
  • 1 like
  • 2 in conversation