Hello!
I am tying to create a new variable that applies a set of rules to an ordered variable in my dataset. We apply the rules to each row of a variable successively. If rule 1 fails, we move on to rule 2, if rule 2 fails we move on to rule 3 if rule 3 fails we move on to rule 4 and if rule 4 fails we stop applying rules at that row and move on to the next group.
My data consists of weekly counts by group. The idea is to apply the rules successively to the main row plus the rows above and below for each group.
The rules go like this:
My main issue here is that I'm not sure how to indicate to sas that I want to call the rows above/below my main row..
Here is an example of my data. I want to apply the rule to the count column and then create a new column called "rule" that applies the rules.
data have;
input group count rule;
datalines;
1 14.81 1
1 69.33 1
1 17.70 1
1 7.5 1
1 3.1 1
1 5.41 1
1 1.75 2
1 1.97 2
1 3.12 1
1 2.66 2
1 0.83 2
1 1.06 3
1 0.60 4
1 -0.30 0
1 1.75 0
1 -0.8 0
;
run;
My SAS skills are proficient enough to get the job done but still fairly rudimentary, so any help or direction would appreciated 🙂
It is easier to look back than ahead in SAS code.
One way to approach is to roll the values out into an array and then scan that to apply your rules.
If I assume that for RULE 2 you actually meant to scan the 2 weeks before and 2 weeks after the current week for run lengths of 2 or more weeks then this code replicates rule 1 and rule2. The other rules are just simple modifications to the rule 2 rule.
data want;
array counts[102] ;
do week=1 by 1 until(last.group);
set have ;
by group week;
counts[week]=count;
end;
maxweek=week;
do week=1 to maxweek ;
count=counts[week];
rule=.;
if 3 <= count then rule=1;
if missing(rule) then do;
runlength=0;
do check=max(1,week-2) to min(maxweek,week+2) until(runlength>1);
if 2 <= counts[check] then runlength+1;
else runlength=0;
end;
if runlength > 1 then rule=2;
end;
if missing(rule) then do;
* similar logic for rule 3 ;
end;
output;
end;
keep group week rule ;
run;
You want this to work by group right? Even though you only have one group in your data?
Yes sorry, I want to apply the rules by group. So there would be more lines in the data with additional groups.
It is easier to look back than ahead in SAS code.
One way to approach is to roll the values out into an array and then scan that to apply your rules.
If I assume that for RULE 2 you actually meant to scan the 2 weeks before and 2 weeks after the current week for run lengths of 2 or more weeks then this code replicates rule 1 and rule2. The other rules are just simple modifications to the rule 2 rule.
data want;
array counts[102] ;
do week=1 by 1 until(last.group);
set have ;
by group week;
counts[week]=count;
end;
maxweek=week;
do week=1 to maxweek ;
count=counts[week];
rule=.;
if 3 <= count then rule=1;
if missing(rule) then do;
runlength=0;
do check=max(1,week-2) to min(maxweek,week+2) until(runlength>1);
if 2 <= counts[check] then runlength+1;
else runlength=0;
end;
if runlength > 1 then rule=2;
end;
if missing(rule) then do;
* similar logic for rule 3 ;
end;
output;
end;
keep group week rule ;
run;
Thanks @Tom -- this seems to work with the example data! I haven't applied it to my own dataset yet.
Just to clarify what's happening here. You're essentially creating a new variable that represents the value for each week in each group (e.g. variable count23 = value for week 23 in each group). Then we can go scan across each of the variables and apply the rules instead of scanning through one variable vertically.
So essentially it is making a temporary array of all of the values for this group.
The first DO loop loads the array. Then the second pulls the individual values back out and so the rule checking can be down with the values in the array.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.