This was mentioned as "Henderson-Whitlock Original Form of DoW-loop" at SAScommunity.org. Since the SUM statement intializes itself to 0, why have sum and count variables been explicitly intialized to 0?
data a; input id $ var; datalines; A 1 A 2 B 3 B 4 B 5 ; data b; count= 0; sum = 0; do until ( last.id ); set a; by id; count+1; sum+var; end; mean = sum / count; run;
That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.
This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" within the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.
Therefore, in many DOW loops in practice the sum statement is avoided and the SUM function is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).
There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)
Take a look at the sum and mean for ID b when you run the code without the initialization.
@ballardw. I checked that before posting this. It gives correct result for first by group (A) but not for the second by group (B). For second group var is accumlated for both A and B. However, for this no intialization is required.
data a; input id $ var; datalines; A 1 A 2 B 3 B 4 B 5 ; data b; do until ( last.id ); set a; by id; count=sum(count,1); sum=sum(sum,var); end; mean = sum / count; run;
That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.
This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" within the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.
Therefore, in many DOW loops in practice the sum statement is avoided and the SUM function is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).
There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)
@FreelanceReinh. Thanks. This helped to see what is going inside one iteration of DOW LOOP.
data b;
do until ( last.id );
put _all_;
set a;
put _all_;
by id;
put _all_;
count=sum(count,1);
put _all_;
sum=sum(sum,var);
put _all_;
end;
mean = sum / count;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.