BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JKCho
Pyrite | Level 9

Hello!

 

Simpl question. 

I want to obtain standard deviations of previous five-year sales. In other words, I need to calculate the std from t-1 to t-5.

As long as I know, the code below means the period between t and t-4. I have tried to find an exact code for this command but could not find but assume I may use other than Proc expand.

 

Could anyone help me with how to write the code for t-1 to t-n period? 

 

Thank you so much!

 

PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD 5);
by gvkey;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20

Sorry, the Std Function calculates the sample std and it seems you want the population std.

 

Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);

    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;

run;

 

Result:

 

std      gvkey  fyear  sale 
0.000    1      2010   100 
43.301   1      2011   200 
82.916   1      2012   300 
111.803  1      2013   400 
111.803  1      2014   500 
0.000    2      2010   100 
43.301   2      2011   200 
82.916   2      2012   300 
111.803  2      2013   400 
111.803  2      2014   500 

 

View solution in original post

10 REPLIES 10
PeterClemmensen
Tourmaline | Level 20

How about

 

data comp1;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 100
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 100
2 2014 200
;

PROC EXPAND DATA=comp1 OUT=sale;
id fyear;
CONVERT sale=stdsale / TRANSFORMOUT=(MOVSTD (1 1 1 1 0));
by gvkey;
RUN;
JKCho
Pyrite | Level 9

Thank you for your reply.

 

It seems that using 

 

TRANSFORMOUT=(MOVSTD (1 1 1 1 0))

 

means x multiplied by(X) 0, x-1 X 1, and so on to x-4 X 1...  which actually results in the period from t-1 to t-4 while the denominator is still 5. I just tried to test with the code below.

 

 

data work;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 100
1 2012 100
1 2013 200
1 2014 200
2 2010 100
2 2011 100
2 2012 100
2 2013 200
2 2014 200
;

 

 

What I obtained from it are just weird values.

11.PNG

The standard deviation of 21.65 looks out of nowhere. I can guess the variance there is sth like 468.7 so if I multiply 4 with this variance, then I get close to 1875, which is the nominator of the definition of STD, though I have no clue where this number comes from.

 

my calculations are... (As the most recent observations are not used)

mean= (100+100+100+200)/4 = 175

variance = {(100-175)^2 +(100-175)^2+(100-175)^2+(200-175)^2} / 4= 4375 (not like 1875)

Std = 66.1438

 

This looks not quite fit with what I seek now standard deviations from t-5 to t-1, which conceptually needs observations at the t-1 period as well.

 

Any help?

 

PeterClemmensen
Tourmaline | Level 20

I think you're right. Is it a requirement to use Proc Expand?

JKCho
Pyrite | Level 9
Not at all, I am just working on replicating a journal paper to use their codes in my research as well. A variable in this paper is defined as standards deviations of the previous five years' sale. I thought PROC EXPAND can handle but facing this difficulty...
PeterClemmensen
Tourmaline | Level 20

Ok. I'm sure Proc Expand can handle this somehow, but I'm no expert.

 

Here is a data step approach that gives you what you want:

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs);
    array x[0:3] _temporary_;
    call missing(of x[*]);

    do obs=1 by 1 until (last.gvkey);    
        std = std(of x[*]);
        set have;
        by gvkey;
        x[mod(obs, 4)]=sale;
        output; 
    end;
   
run;
JKCho
Pyrite | Level 9
Thank you for your help.
but the output value by the above code does not provide correct standard deviations as well.
I could not understand "x[mod(obs, 4)]=sale;" part, and whether this code is forward-looking or backward is not sure.

I think I should use other than PROC EXPAND.

Thanks for your helps again 🙂
PeterClemmensen
Tourmaline | Level 20

Sorry, the Std Function calculates the sample std and it seems you want the population std.

 

Just edited the program a little. See if this gives you the desired result. If so, I'll explain the logic 🙂

 

data have;
input gvkey fyear sale;
datalines;
1 2010 100
1 2011 200
1 2012 300
1 2013 400
1 2014 500
2 2010 100
2 2011 200
2 2012 300
2 2013 400
2 2014 500
;

data want(drop=obs i);
    array x[0:3] _temporary_;
    array y[0:3] _temporary_;
    call stdize('replace', 'mult=', 0, of x[*], _N_);

    do obs=1 by 1 until (last.gvkey);    
        do i = 0 to dim(x)-1;
           y [i] = (x [i] - mean(of x[*])) ** 2;
        end;
        std = sqrt(divide(sum(of y[*]), 4));
        set have;
        by gvkey;
        x[mod(obs, 4)] = sale;
        output; 
    end;

run;

 

Result:

 

std      gvkey  fyear  sale 
0.000    1      2010   100 
43.301   1      2011   200 
82.916   1      2012   300 
111.803  1      2013   400 
111.803  1      2014   500 
0.000    2      2010   100 
43.301   2      2011   200 
82.916   2      2012   300 
111.803  2      2013   400 
111.803  2      2014   500 

 

JKCho
Pyrite | Level 9
Thank you for your ongoing efforts! I actually came up with easier programming but your code certainly works as well! Thank you so much!

Have a good day!
PeterClemmensen
Tourmaline | Level 20

Btw, here mean = (100 + 100 + 100 + 200) / 4 = 125 😉

JKCho
Pyrite | Level 9
lol... I made a huge but simple mistake! thank you for letting me know that 🙂

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1180 views
  • 1 like
  • 2 in conversation