DATA Step, Macro, Functions and more

sum statement within DO UNTIL()

Accepted Solution Solved
Reply
Super Contributor
Posts: 271
Accepted Solution

sum statement within DO UNTIL()

This was mentioned as "Henderson-Whitlock Original Form of DoW-loop" at SAScommunity.org. Since the SUM statement intializes itself to 0, why have sum and count variables been explicitly intialized to 0?

 

data a;
	input id $ var;
	datalines;
A 1 
A 2 
B 3 
B 4 
B 5 
;

data b;
	count= 0;
	sum = 0;

	do until ( last.id );
		set a;
		by id;
		count+1;
		sum+var;
	end;

	mean = sum / count;
run;

Accepted Solutions
Solution
‎03-23-2016 03:37 PM
Trusted Advisor
Posts: 1,117

Re: sum statement within DO UNTIL()

Posted in reply to SAS_inquisitive

That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.

This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" within the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.

 

Therefore, in many DOW loops in practice the sum statement is avoided and the SUM function is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).

 

There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)

 

View solution in original post


All Replies
Super User
Posts: 11,343

Re: sum statement within DO UNTIL()

Posted in reply to SAS_inquisitive

Take a look at the sum and mean for ID b when you run the code without the initialization.

 

 

Super Contributor
Posts: 271

Re: sum statement within DO UNTIL()

@ballardw. I checked that before posting this. It gives correct result for first by group (A) but not for the second by group (B).  For second group var is accumlated for both A and B. However, for this no intialization is required.

 

data a;
	input id $ var;
	datalines;
A 1 
A 2 
B 3 
B 4 
B 5 
;

data b;
	do until ( last.id );
		set a;
		by id;
		count=sum(count,1);
		sum=sum(sum,var);
	end;

	mean = sum / count;
run;
Solution
‎03-23-2016 03:37 PM
Trusted Advisor
Posts: 1,117

Re: sum statement within DO UNTIL()

Posted in reply to SAS_inquisitive

That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.

This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" within the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.

 

Therefore, in many DOW loops in practice the sum statement is avoided and the SUM function is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).

 

There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)

 

Super Contributor
Posts: 271

Re: sum statement within DO UNTIL()

Posted in reply to FreelanceReinhard

@FreelanceReinhard.  Thanks. This helped  to see what is going inside one iteration of DOW LOOP.

 

data b;
	do until ( last.id );
	   put _all_;
		set a;
		put _all_;
		by id;
		put _all_;
		count=sum(count,1);
		put _all_;
		sum=sum(sum,var);
		put _all_;
	end;

	mean = sum / count;
run;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 527 views
  • 4 likes
  • 3 in conversation