BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jennifer925
Fluorite | Level 6

How to use the proc SQL to get the rolling 3 years standard deviation and set the starting year is 1984(exclude the first three zero value of X but keep the zero value of year 1986) ? Thanks a lot.

 




data have; infile cards dlm=',' truncover; input year x; cards; 1980,0 1981,0 1982,0 1983,0 1984,5 1985,7 1986,0 1987,6 1988,2 1989,1 ; run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You apparently want to include data only when the first non-zero value is encountered.  And then you want 3-yr  trailing STD, which means output doesn't start until the  2nd record following the first non-zero:

 

data want (drop=n_nonzeroes);

  set have;

  n_nonzeroes+(x^=0);

   if n_nonzeroes=0 then delete;    /* No input for std until first non-zero record*/

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0;         /* No output until 2nd record following first non-zero*/

run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
Reeza
Super User

Do you have SAS ETS? That's a good way to calculate moving statistics.

 

 

ballardw
Super User

SQL really doesn't have a good concept of order in the data ("first 3") other than to apply a sort order on output. I am not quite sure how to interpret what to keep/exclude.

Does this get a start on what you want:

data want;
   set have;
   lx1=lag(x);lx2=lag2(x);lx3=lag3(x);
   STD3yr = std(lx1,lx2,lx3);
   drop lx:;
run;
Reeza
Super User

Here are two other ways, which account for the 0 not being included. 

 

One is SQL and one is temporary arrays, each has it's own benefits.

 

data have;
infile cards dlm=',' truncover;
input year x;
cards;
1980,0
1981,0
1982,0
1983,0
1984,5
1985,7
1986,0
1987,6
1988,2
1989,1
;
run;

/*Using a temporary array  - this assumes you have all records*/
data want1;
array p{0:2} _temporary_;
set have; 
p{mod(_n_,3)} = x;
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 then stdev = std(of p{*});

run;


proc print data=want1; run;

/*SQL approach*/
proc sql;
create table want2 as
select a.year, a.x, case when count(b.x) =3 and max(b.x) ne 0 then std(b.x) 
else . end as stdev 
from have as a
left join have as b
on b.year between a.year-2 and a.year
group by a.year, a.x;
quit;

proc print data=want2; run;
Ksharp
Super User
Do you have to use SQL ? 




data have;
infile cards dlm=',' truncover;
input  year x;
cards;
1980,0
1981,0
1982,0
1983,0
1984,5
1985,7
1986,0
1987,6
1988,2
1989,1

;
run;
proc sql;
select *,case when (select count(*) from have where year lt a.year) le 4 then .
         else (select std(x) from have where year between a.year-2 and a.year) end as std
 from have as a;
quit;
Jennifer925
Fluorite | Level 6

The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).

ballardw
Super User

@Jennifer925 wrote:

The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).


Not sure which post you are responding to with this comment. You can use the QUOTE link at the top of the text box to include text from the one you are responding to.

 

The code I posted with Lag has the STD for 5,7,0 in the result for 1986. You could comment out or remove the DROP statement to see which values are used for calculating any given row. If the results aren't as you need perhaps adjust the lag interval. Maybe you mean lag1, lag2 and X instead of the lag3 I used.

If you don't want a result for given years you could add something like
if year > (which ever boundary you want) then STD3yr = ... ;

As I mentioned, I am still not sure which years you want results for.

Jennifer925
Fluorite | Level 6

proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;

 

I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1),  I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.

Reeza
Super User

 


@Jennifer925 wrote:

proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;

 

I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1),  I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.


Do you have a SAS ETS licence? 

 Run the following to see:

 

proc setinit;

run; 

 

If you do, proc Expand is your best bet, see the third example in the documentation.  

Jennifer925
Fluorite | Level 6
I do not have the license of SAS ETS.
mkeintz
PROC Star

You apparently want to include data only when the first non-zero value is encountered.  And then you want 3-yr  trailing STD, which means output doesn't start until the  2nd record following the first non-zero:

 

data want (drop=n_nonzeroes);

  set have;

  n_nonzeroes+(x^=0);

   if n_nonzeroes=0 then delete;    /* No input for std until first non-zero record*/

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0;         /* No output until 2nd record following first non-zero*/

run;

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1608 views
  • 0 likes
  • 5 in conversation