How to use the proc SQL to get the rolling 3 years standard deviation and set the starting year is 1984(exclude the first three zero value of X but keep the zero value of year 1986) ? Thanks a lot.
data have; infile cards dlm=',' truncover; input year x; cards; 1980,0 1981,0 1982,0 1983,0 1984,5 1985,7 1986,0 1987,6 1988,2 1989,1 ; run;
You apparently want to include data only when the first non-zero value is encountered. And then you want 3-yr trailing STD, which means output doesn't start until the 2nd record following the first non-zero:
data want (drop=n_nonzeroes);
set have;
n_nonzeroes+(x^=0);
if n_nonzeroes=0 then delete; /* No input for std until first non-zero record*/
std3yr=std(x,lag(x),lag2(x));
if lag2(n_nonzeroes)>0; /* No output until 2nd record following first non-zero*/
run;
Do you have SAS ETS? That's a good way to calculate moving statistics.
SQL really doesn't have a good concept of order in the data ("first 3") other than to apply a sort order on output. I am not quite sure how to interpret what to keep/exclude.
Does this get a start on what you want:
data want; set have; lx1=lag(x);lx2=lag2(x);lx3=lag3(x); STD3yr = std(lx1,lx2,lx3); drop lx:; run;
Here are two other ways, which account for the 0 not being included.
One is SQL and one is temporary arrays, each has it's own benefits.
data have;
infile cards dlm=',' truncover;
input year x;
cards;
1980,0
1981,0
1982,0
1983,0
1984,5
1985,7
1986,0
1987,6
1988,2
1989,1
;
run;
/*Using a temporary array - this assumes you have all records*/
data want1;
array p{0:2} _temporary_;
set have;
p{mod(_n_,3)} = x;
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 then stdev = std(of p{*});
run;
proc print data=want1; run;
/*SQL approach*/
proc sql;
create table want2 as
select a.year, a.x, case when count(b.x) =3 and max(b.x) ne 0 then std(b.x)
else . end as stdev
from have as a
left join have as b
on b.year between a.year-2 and a.year
group by a.year, a.x;
quit;
proc print data=want2; run;
Do you have to use SQL ? data have; infile cards dlm=',' truncover; input year x; cards; 1980,0 1981,0 1982,0 1983,0 1984,5 1985,7 1986,0 1987,6 1988,2 1989,1 ; run; proc sql; select *,case when (select count(*) from have where year lt a.year) le 4 then . else (select std(x) from have where year between a.year-2 and a.year) end as std from have as a; quit;
The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).
@Jennifer925 wrote:
The result is a little bit different. I am thinking that the first stdandard deviation is for 5,7,and 0(starting from 1984 to 1986), the second one should be for 7,0 and 6( from 1985 to 1987).
Not sure which post you are responding to with this comment. You can use the QUOTE link at the top of the text box to include text from the one you are responding to.
The code I posted with Lag has the STD for 5,7,0 in the result for 1986. You could comment out or remove the DROP statement to see which values are used for calculating any given row. If the results aren't as you need perhaps adjust the lag interval. Maybe you mean lag1, lag2 and X instead of the lag3 I used.
If you don't want a result for given years you could add something like
if year > (which ever boundary you want) then STD3yr = ... ;
As I mentioned, I am still not sure which years you want results for.
proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;
I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1), I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.
@Jennifer925 wrote:
proc sql;
select *,case when (select count(*) from have where year lt a.year) le 5 then .
else (select var(x) from have where year between a.year-2 and a.year) end as var
from have as a;
quit;
I just used this code get the result I wanted. But I am trying to use a function to describe the number 5( count (X=0)before starting point plus 1), I have a lot of different categories and their starting points are different. Could you help me out of these problem? Thanks.
Do you have a SAS ETS licence?
Run the following to see:
proc setinit;
run;
If you do, proc Expand is your best bet, see the third example in the documentation.
You apparently want to include data only when the first non-zero value is encountered. And then you want 3-yr trailing STD, which means output doesn't start until the 2nd record following the first non-zero:
data want (drop=n_nonzeroes);
set have;
n_nonzeroes+(x^=0);
if n_nonzeroes=0 then delete; /* No input for std until first non-zero record*/
std3yr=std(x,lag(x),lag2(x));
if lag2(n_nonzeroes)>0; /* No output until 2nd record following first non-zero*/
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.