rolling standard deviation separated by many categories

Solved
Occasional Contributor
Posts: 12

rolling standard deviation separated by many categories

Hello all,

The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by  treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.

data have;

infile cards dlm=',' truncover;

input treatment year x;

cards;

1,1980,0

1,1981,0

1,1982,0

1,1983,0

1,1984,5

1,1985,7

1,1986,0

1,1987,6

1,1988,2

1,1989,1

2,1980,0

2,1981,0

2,1982,0

2,1983,3

2,1984,4

2,1985,6

2,1986,0

2,1987,7

2,1988,3

2,1989,2

2,1990,1

2,1991,1;

run;

I hope the result like this:

treatment    year   x      std3yr

1      1980  0      .

1      1981  0      .

1      1982  0      .

1      1983  0      .

1      1984  5      .

1      1985  7      .

1      1986   0      3.6055512755

1      1987   6      3.7859388972

1      1988   2      3.0550504633

1      1989   1      2.6457513111

2      1980  0      .

2      1981  0      .

2      1982  0      .

2      1983  3      .

2      1984  4      .

2      1985   6      1.5275252317

2      1986   0      3.0550504633

2      1987   7      3.7859388972

2      1988   3      3.5118845843

2      1989   2      2.6457513111

2      1990   1      1

2      1991   1      0.5773502692

Or

treatment    year   x      std3yr

1      1986   0      3.6055512755

1      1987   6      3.7859388972

1      1988   2      3.0550504633

1      1989   1      2.6457513111

2      1985   6      1.5275252317

2      1986   0      3.0550504633

2      1987   7      3.7859388972

2      1988   3      3.5118845843

2      1989   2      2.6457513111

2      1990   1      1

2      1991   1      0.5773502692

Accepted Solutions
Solution
‎01-03-2017 02:38 PM
Super User
Posts: 10,766

Re: rolling standard deviation separated by many categories

```

data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;
data want;
set have;
by treatment;
retain found ;
lag_x=lag(x);
lag2_x=lag2(x);
if first.treatment then do;found=0;count=0;end;
if x ne 0 then found=1;
if found then count+1;
if count ge 3 then std3=std(x,lag_x,lag2_x);
drop lag:;
run;

```

All Replies
Super User
Posts: 23,663

Re: rolling standard deviation separated by many categories

The solutions previously provide can be easily adapted with a BY statement. Which method did you choose to go with, rather than starting from scratch.

Occasional Contributor
Posts: 12

Re: rolling standard deviation separated by many categories

@Reeza

I have tried by statement, but the result is messy.
Super User
Posts: 23,663

Re: rolling standard deviation separated by many categories

You posted this question for a single category earlier. Several solutions were provided. Which one did you want to work with? If you post that we can help modify it for multiple groups.

Rather than revisit every possible solution, it's better to start with something you have and understand.

Occasional Contributor
Posts: 12

Re: rolling standard deviation separated by many categories

@Reeza

I prefer using PROC SQL, could you do me a great favor to modify the code?Also, you can use the others. Thanks.
Super User
Posts: 23,663

Re: rolling standard deviation separated by many categories

I couldn't think of a way to identify the starting records in SQL while also including 0's in the data.

SQL doesn't have a concept of rows that allows counting, unless you use Cursor logic, which isn't available in SAS.

I'm sure there's a way, but its probably cumbersome...and more code than I'm willing to write at the moment.

Here's a data step solution, that's relatively straightforward.

``````data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;

data want1;
/*Set temporary array to store the last 3 records*/
array p{0:2} _temporary_;

set have;
by treatment; /*BY GROUP PROCESSING*/

retain count;/*Keep value of count from row to row*/

p{mod(_n_,3)} = x; /*Assign value to the array for calculations*/

/*If first treatment record then reset
the array to missing and set the counter to 0*/
if first.treatment then do;
call missing (of p(*));
count=0;
end;

/*Increment Counter if the value if starting record is found*/
if x ne 0 or count>0 then count+1;

/*Calculate STDEV*/
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 and count>= 3
then stdev = std(of p{*});

run;

proc print data=want1; run;``````
Super User
Posts: 13,498

Re: rolling standard deviation separated by many categories

Jennifer925 wrote:

The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by  treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.

Are the 400 categories levels of the treatment variable?

Do you have, or can you make, a dataset with the "different starting point" for each treatment.

For example it looks like you might have
Treatment    Startyear

1                  1986

2                  1985

If so then then you could combine the rule data with the actual data by treatment, and only calculate the value for years greater than or equal to the start year.

Occasional Contributor
Posts: 12

Re: rolling standard deviation separated by many categories

@ballardw
yes, 400 categories are the level of the treatment.
Posts: 1,337

Re: rolling standard deviation separated by many categories

Jennifer:

This is the same problem you submitted earlier, but with treatment as a by-group.  Think about the logic of the single treatment solution that you accepted:

data want (drop=n_nonzeroes);

set have;

n_nonzeroes+(x^=0);

if n_nonzeroes=0 then delete; /* No input for std until first non-zero record*/

std3yr=std(x,lag(x),lag2(x));

if lag2(n_nonzeroes)>0; /* No output until 2nd record following first non-zero*/

run;

All you have to do is

1. Reset n_nonzeroes to 0 at the beginning of each BY group - a perfectly ordinary technique of SAS by-group programming.
2. And change the LAG2 test to include a test for same treatment.  The latter is neccessary because otherwise the "lag2(n_nonzeroes)>0" test for a given treatment could be contaminated by results from the prior treatment.

The bold-italics below show the changes:

data want (drop=n_nonzeroes);

set have;

by treatment;

if first.treatment=1 then n_nonzeroes=0;

n_nonzeroes+(x^=0);

if n_nonzeroes=0 then delete;

std3yr=std(x,lag(x),lag2(x));

if lag2(n_nonzeroes)>0 and lag2(treatment)=treatment ; /* No output until 2nd record following first non-zero */

run;

I notice that unlike your original question, your WANT sample now includes the records with leading zeroes in each by group. That actually makes the program a bit more compact, since there are no subsetting IF's or DELETE statements.:

data want2 (drop=n_nonzeroes);

set have;

by treatment;

if first.treatment=1 then n_nonzeroes=0;

n_nonzeroes+(x^=0);

std3yr=ifn(lag2(n_nonzeroes)>0 and lag2(treatment)=treatment,std(x,lag(x),lag2(x)),.);

run;

Solution
‎01-03-2017 02:38 PM
Super User
Posts: 10,766

Re: rolling standard deviation separated by many categories

```

data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;
data want;
set have;
by treatment;
retain found ;
lag_x=lag(x);
lag2_x=lag2(x);
if first.treatment then do;found=0;count=0;end;
if x ne 0 then found=1;
if found then count+1;
if count ge 3 then std3=std(x,lag_x,lag2_x);
drop lag:;
run;

```
☑ This topic is solved.