Hello all,
The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.
data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1;
run;
I hope the result like this:
treatment year x std3yr
1 1980 0 .
1 1981 0 .
1 1982 0 .
1 1983 0 .
1 1984 5 .
1 1985 7 .
1 1986 0 3.6055512755
1 1987 6 3.7859388972
1 1988 2 3.0550504633
1 1989 1 2.6457513111
2 1980 0 .
2 1981 0 .
2 1982 0 .
2 1983 3 .
2 1984 4 .
2 1985 6 1.5275252317
2 1986 0 3.0550504633
2 1987 7 3.7859388972
2 1988 3 3.5118845843
2 1989 2 2.6457513111
2 1990 1 1
2 1991 1 0.5773502692
Or
treatment year x std3yr
1 1986 0 3.6055512755
1 1987 6 3.7859388972
1 1988 2 3.0550504633
1 1989 1 2.6457513111
2 1985 6 1.5275252317
2 1986 0 3.0550504633
2 1987 7 3.7859388972
2 1988 3 3.5118845843
2 1989 2 2.6457513111
2 1990 1 1
2 1991 1 0.5773502692
data have; infile cards dlm=',' truncover; input treatment year x; cards; 1,1980,0 1,1981,0 1,1982,0 1,1983,0 1,1984,5 1,1985,7 1,1986,0 1,1987,6 1,1988,2 1,1989,1 2,1980,0 2,1981,0 2,1982,0 2,1983,3 2,1984,4 2,1985,6 2,1986,0 2,1987,7 2,1988,3 2,1989,2 2,1990,1 2,1991,1 ; run; data want; set have; by treatment; retain found ; lag_x=lag(x); lag2_x=lag2(x); if first.treatment then do;found=0;count=0;end; if x ne 0 then found=1; if found then count+1; if count ge 3 then std3=std(x,lag_x,lag2_x); drop lag:; run;
The solutions previously provide can be easily adapted with a BY statement. Which method did you choose to go with, rather than starting from scratch.
You posted this question for a single category earlier. Several solutions were provided. Which one did you want to work with? If you post that we can help modify it for multiple groups.
Rather than revisit every possible solution, it's better to start with something you have and understand.
I couldn't think of a way to identify the starting records in SQL while also including 0's in the data.
SQL doesn't have a concept of rows that allows counting, unless you use Cursor logic, which isn't available in SAS.
I'm sure there's a way, but its probably cumbersome...and more code than I'm willing to write at the moment.
Here's a data step solution, that's relatively straightforward.
data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;
data want1;
/*Set temporary array to store the last 3 records*/
array p{0:2} _temporary_;
set have;
by treatment; /*BY GROUP PROCESSING*/
retain count;/*Keep value of count from row to row*/
p{mod(_n_,3)} = x; /*Assign value to the array for calculations*/
/*If first treatment record then reset
the array to missing and set the counter to 0*/
if first.treatment then do;
call missing (of p(*));
count=0;
end;
/*Increment Counter if the value if starting record is found*/
if x ne 0 or count>0 then count+1;
/*Calculate STDEV*/
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 and count>= 3
then stdev = std(of p{*});
run;
proc print data=want1; run;
@Jennifer925 wrote:
The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.
Are the 400 categories levels of the treatment variable?
Do you have, or can you make, a dataset with the "different starting point" for each treatment.
For example it looks like you might have
Treatment Startyear
1 1986
2 1985
If so then then you could combine the rule data with the actual data by treatment, and only calculate the value for years greater than or equal to the start year.
Jennifer:
This is the same problem you submitted earlier, but with treatment as a by-group. Think about the logic of the single treatment solution that you accepted:
data want (drop=n_nonzeroes);
set have;
n_nonzeroes+(x^=0);
if n_nonzeroes=0 then delete; /* No input for std until first non-zero record*/
std3yr=std(x,lag(x),lag2(x));
if lag2(n_nonzeroes)>0; /* No output until 2nd record following first non-zero*/
run;
All you have to do is
data want (drop=n_nonzeroes);
set have;
by treatment;
if first.treatment=1 then n_nonzeroes=0;
n_nonzeroes+(x^=0);
if n_nonzeroes=0 then delete;
std3yr=std(x,lag(x),lag2(x));
if lag2(n_nonzeroes)>0 and lag2(treatment)=treatment ; /* No output until 2nd record following first non-zero */
run;
I notice that unlike your original question, your WANT sample now includes the records with leading zeroes in each by group. That actually makes the program a bit more compact, since there are no subsetting IF's or DELETE statements.:
data want2 (drop=n_nonzeroes);
set have;
by treatment;
if first.treatment=1 then n_nonzeroes=0;
n_nonzeroes+(x^=0);
std3yr=ifn(lag2(n_nonzeroes)>0 and lag2(treatment)=treatment,std(x,lag(x),lag2(x)),.);
run;
data have; infile cards dlm=',' truncover; input treatment year x; cards; 1,1980,0 1,1981,0 1,1982,0 1,1983,0 1,1984,5 1,1985,7 1,1986,0 1,1987,6 1,1988,2 1,1989,1 2,1980,0 2,1981,0 2,1982,0 2,1983,3 2,1984,4 2,1985,6 2,1986,0 2,1987,7 2,1988,3 2,1989,2 2,1990,1 2,1991,1 ; run; data want; set have; by treatment; retain found ; lag_x=lag(x); lag2_x=lag2(x); if first.treatment then do;found=0;count=0;end; if x ne 0 then found=1; if found then count+1; if count ge 3 then std3=std(x,lag_x,lag2_x); drop lag:; run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.