DATA Step, Macro, Functions and more

rolling standard deviation set the starting point

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 12
Accepted Solution

rolling standard deviation set the starting point

[ Edited ]

How to use the proc SQL to get the rolling 3 years standard deviation and set the starting year is 1984(exclude the first three zero value of X but keep the zero value of year 1986) ? Thanks a lot.

 




data have; infile cards dlm=',' truncover; input year x; cards; 1980,0 1981,0 1982,0 1983,0 1984,5 1985,7 1986,0 1987,6 1988,2 1989,1 ; run;

 


Accepted Solutions
Solution
‎12-22-2016 02:55 PM
Valued Guide
Posts: 797

Re: rolling standard deviation set the starting point

You apparently want to include data only when the first non-zero value is encountered.  And then you want 3-yr  trailing STD, which means output doesn't start until the  2nd record following the first non-zero:

 

data want (drop=n_nonzeroes);

  set have;

  n_nonzeroes+(x^=0);

   if n_nonzeroes=0 then delete;    /* No input for std until first non-zero record*/

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0;         /* No output until 2nd record following first non-zero*/

run;

 

 

View solution in original post


All Replies
Super User
Posts: 17,824

Re: rolling standard deviation set the starting point

Do you have SAS ETS? That's a good way to calculate moving statistics.

 

 

Super User
Posts: 10,500

Re: rolling standard deviation set the starting point

SQL really doesn't have a good concept of order in the data ("first 3") other than to apply a sort order on output. I am not quite sure how to interpret what to keep/exclude.

Does this get a start on what you want:

data want;
   set have;
   lx1=lag(x);lx2=lag2(x);lx3=lag3(x);
   STD3yr = std(lx1,lx2,lx3);
   drop lx:;
run;
Super User
Posts: 17,824

Re: rolling standard deviation set the starting point

Here are two other ways, which account for the 0 not being included. 

 

One is SQL and one is temporary arrays, each has it's own benefits.

 

data have;
infile cards dlm=',' truncover;
input year x;
cards;
1980,0
1981,0
1982,0
1983,0
1984,5
1985,7
1986,0
1987,6
1988,2
1989,1
;
run;

/*Using a temporary array  - this assumes you have all records*/
data want1;
array p{0:2} _temporary_;
set have; 
p{mod(_n_,3)} = x;
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 then stdev = std(of p{*});

run;


proc print data=want1; run;

/*SQL approach*/
proc sql;
create table want2 as
select a.year, a.x, case when count(b.x) =3 and max(b.x) ne 0 then std(b.x) 
else . end as stdev 
from have as a
left join have as b
on b.year between a.year-2 and a.year
group by a.year, a.x;
quit;

proc print data=want2; run;
Super User
Posts: 9,681

Re: rolling standard deviation set the starting point

Do you have to use SQL ? 




data have;
infile cards dlm=',' truncover;
input  year x;
cards;
1980,0
1981,0
1982,0
1983,0
1984,5
1985,7
1986,0
1987,6
1988,2
1989,1

;
run;
proc sql;
select *,case when (select count(*) from have where year lt a.year) le 4 then .
         else (select std(x) from have where year between a.year-2 and a.year) end as std
 from have as a;
quit;
Occasional Contributor
Posts: 12

Re: rolling standard deviation set the starting point

The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).

Super User
Posts: 10,500

Re: rolling standard deviation set the starting point


Jennifer925 wrote:

The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).


Not sure which post you are responding to with this comment. You can use the QUOTE link at the top of the text box to include text from the one you are responding to.

 

The code I posted with Lag has the STD for 5,7,0 in the result for 1986. You could comment out or remove the DROP statement to see which values are used for calculating any given row. If the results aren't as you need perhaps adjust the lag interval. Maybe you mean lag1, lag2 and X instead of the lag3 I used.

If you don't want a result for given years you could add something like
if year > (which ever boundary you want) then STD3yr = ... ;

As I mentioned, I am still not sure which years you want results for.

Occasional Contributor
Posts: 12

Re: rolling standard deviation set the starting point

proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;

 

I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1),  I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.

Super User
Posts: 17,824

Re: rolling standard deviation set the starting point

 


Jennifer925 wrote:

proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;

 

I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1),  I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.


Do you have a SAS ETS licence? 

 Run the following to see:

 

proc setinit;

run; 

 

If you do, proc Expand is your best bet, see the third example in the documentation.  

Occasional Contributor
Posts: 12

Re: rolling standard deviation set the starting point

I do not have the license of SAS ETS.
Solution
‎12-22-2016 02:55 PM
Valued Guide
Posts: 797

Re: rolling standard deviation set the starting point

You apparently want to include data only when the first non-zero value is encountered.  And then you want 3-yr  trailing STD, which means output doesn't start until the  2nd record following the first non-zero:

 

data want (drop=n_nonzeroes);

  set have;

  n_nonzeroes+(x^=0);

   if n_nonzeroes=0 then delete;    /* No input for std until first non-zero record*/

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0;         /* No output until 2nd record following first non-zero*/

run;

 

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 254 views
  • 0 likes
  • 5 in conversation