BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jennifer925
Fluorite | Level 6

Hello all,

 

The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by  treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.

 

 

data have;

infile cards dlm=',' truncover;

input treatment year x;

cards;

1,1980,0

1,1981,0

1,1982,0

1,1983,0

1,1984,5

1,1985,7

1,1986,0

1,1987,6

1,1988,2

1,1989,1

2,1980,0

2,1981,0

2,1982,0

2,1983,3

2,1984,4

2,1985,6

2,1986,0

2,1987,7

2,1988,3

2,1989,2

2,1990,1

2,1991,1;

run;

 

 

I hope the result like this:

treatment    year   x      std3yr

1      1980  0      .

1      1981  0      .

1      1982  0      .

1      1983  0      .

1      1984  5      .

1      1985  7      .

1      1986   0      3.6055512755

1      1987   6      3.7859388972

1      1988   2      3.0550504633

1      1989   1      2.6457513111

2      1980  0      .

2      1981  0      .

2      1982  0      .

2      1983  3      .

2      1984  4      .

2      1985   6      1.5275252317

2      1986   0      3.0550504633

2      1987   7      3.7859388972

2      1988   3      3.5118845843

2      1989   2      2.6457513111

2      1990   1      1

2      1991   1      0.5773502692

 

Or

 

treatment    year   x      std3yr

1      1986   0      3.6055512755

1      1987   6      3.7859388972

1      1988   2      3.0550504633

1      1989   1      2.6457513111

2      1985   6      1.5275252317

2      1986   0      3.0550504633

2      1987   7      3.7859388972

2      1988   3      3.5118845843

2      1989   2      2.6457513111

2      1990   1      1

2      1991   1      0.5773502692

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

 
data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;
data want;
 set have;
 by treatment;
 retain found ;
 lag_x=lag(x);
 lag2_x=lag2(x);
 if first.treatment then do;found=0;count=0;end;
 if x ne 0 then found=1;
 if found then count+1;
 if count ge 3 then std3=std(x,lag_x,lag2_x);
 drop lag:;
run;

View solution in original post

9 REPLIES 9
Reeza
Super User

The solutions previously provide can be easily adapted with a BY statement. Which method did you choose to go with, rather than starting from scratch. 

Jennifer925
Fluorite | Level 6
@Reeza

I have tried by statement, but the result is messy.
Reeza
Super User

You posted this question for a single category earlier. Several solutions were provided. Which one did you want to work with? If you post that we can help modify it for multiple groups.

 

Rather than revisit every possible solution, it's better to start with something you have and understand. 

Jennifer925
Fluorite | Level 6
@Reeza

I prefer using PROC SQL, could you do me a great favor to modify the code?Also, you can use the others. Thanks.
Reeza
Super User

I couldn't think of a way to identify the starting records in SQL while also including 0's in the data.

SQL doesn't have a concept of rows that allows counting, unless you use Cursor logic, which isn't available in SAS. 

I'm sure there's a way, but its probably cumbersome...and more code than I'm willing to write at the moment. 

 

Here's a data step solution, that's relatively straightforward.

 

data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;

data want1;
/*Set temporary array to store the last 3 records*/
array p{0:2} _temporary_;

set have; 
by treatment; /*BY GROUP PROCESSING*/

retain count;/*Keep value of count from row to row*/

p{mod(_n_,3)} = x; /*Assign value to the array for calculations*/

/*If first treatment record then reset 
the array to missing and set the counter to 0*/
if first.treatment then do;
    call missing (of p(*));
    count=0;
end;

/*Increment Counter if the value if starting record is found*/
if x ne 0 or count>0 then count+1;

/*Calculate STDEV*/
if nmiss(of p(*)) = 0 and max(of p(*)) ne 0 and count>= 3 
    then stdev = std(of p{*});

run;


proc print data=want1; run;
ballardw
Super User

@Jennifer925 wrote:

 

The ture data has many categories(mor than 400). I want the result to show the rolling 3years std by  treatment, but each treatment has different starting points. Now assuming two treatments, could you do me a great favor to get the result as follows(either one is fine)? Thanks a lot.


Are the 400 categories levels of the treatment variable?

Do you have, or can you make, a dataset with the "different starting point" for each treatment.

For example it looks like you might have
Treatment    Startyear

1                  1986

2                  1985

 

If so then then you could combine the rule data with the actual data by treatment, and only calculate the value for years greater than or equal to the start year.

Jennifer925
Fluorite | Level 6
@ballardw
yes, 400 categories are the level of the treatment.
mkeintz
PROC Star

Jennifer:

 

This is the same problem you submitted earlier, but with treatment as a by-group.  Think about the logic of the single treatment solution that you accepted:

 

data want (drop=n_nonzeroes);

  set have;

   n_nonzeroes+(x^=0);

  if n_nonzeroes=0 then delete; /* No input for std until first non-zero record*/

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0; /* No output until 2nd record following first non-zero*/

run;

 

All you have to do is

  1. Reset n_nonzeroes to 0 at the beginning of each BY group - a perfectly ordinary technique of SAS by-group programming.
  2. And change the LAG2 test to include a test for same treatment.  The latter is neccessary because otherwise the "lag2(n_nonzeroes)>0" test for a given treatment could be contaminated by results from the prior treatment.

    The bold-italics below show the changes:

 

data want (drop=n_nonzeroes);

  set have;

  by treatment;

  if first.treatment=1 then n_nonzeroes=0;

  n_nonzeroes+(x^=0);

  if n_nonzeroes=0 then delete;

  std3yr=std(x,lag(x),lag2(x));

  if lag2(n_nonzeroes)>0 and lag2(treatment)=treatment ; /* No output until 2nd record following first non-zero */

run;

 

I notice that unlike your original question, your WANT sample now includes the records with leading zeroes in each by group. That actually makes the program a bit more compact, since there are no subsetting IF's or DELETE statements.:

 

data want2 (drop=n_nonzeroes);

  set have;

  by treatment;

  if first.treatment=1 then n_nonzeroes=0;

  n_nonzeroes+(x^=0);

  std3yr=ifn(lag2(n_nonzeroes)>0 and lag2(treatment)=treatment,std(x,lag(x),lag2(x)),.);

run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User

 
data have;
infile cards dlm=',' truncover;
input treatment year x;
cards;
1,1980,0
1,1981,0
1,1982,0
1,1983,0
1,1984,5
1,1985,7
1,1986,0
1,1987,6
1,1988,2
1,1989,1
2,1980,0
2,1981,0
2,1982,0
2,1983,3
2,1984,4
2,1985,6
2,1986,0
2,1987,7
2,1988,3
2,1989,2
2,1990,1
2,1991,1
;
run;
data want;
 set have;
 by treatment;
 retain found ;
 lag_x=lag(x);
 lag2_x=lag2(x);
 if first.treatment then do;found=0;count=0;end;
 if x ne 0 then found=1;
 if found then count+1;
 if count ge 3 then std3=std(x,lag_x,lag2_x);
 drop lag:;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1277 views
  • 3 likes
  • 5 in conversation