turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- sum statement within DO UNTIL()

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 12:02 PM

This was mentioned as "Henderson-Whitlock Original Form of DoW-loop" at SAScommunity.org. Since the SUM statement intializes itself to 0, why have sum and count variables been explicitly intialized to 0?

data a; input id $ var; datalines; A 1 A 2 B 3 B 4 B 5 ; data b; count= 0; sum = 0; do until ( last.id ); set a; by id; count+1; sum+var; end; mean = sum / count; run;

Accepted Solutions

Solution

03-23-2016
03:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 02:25 PM

That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.

This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" *within* the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.

Therefore, in many DOW loops in practice the sum statement is avoided and the SUM *function* is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).

There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 12:28 PM

Take a look at the sum and mean for ID b when you run the code without the initialization.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 12:40 PM

@ballardw. I checked that before posting this. It gives correct result for first by group (A) but not for the second by group (B). For second group var is accumlated for both A and B. However, for this no intialization is required.

data a; input id $ var; datalines; A 1 A 2 B 3 B 4 B 5 ; data b; do until ( last.id ); set a; by id; count=sum(count,1); sum=sum(sum,var); end; mean = sum / count; run;

Solution

03-23-2016
03:37 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 02:25 PM

That's true. The sum statement not only implies the initialization to 0, but also a RETAIN for the variable being incremented.

This implicit RETAIN conflicts with the DOW loop, one of whose major purposes is to have a "RETAIN effect" *within* the loop (i.e. within one iteration of the data step), but to let the standard data step behavior at the beginning of each iteration of the data step set (unretained) variables automatically to missing.

Therefore, in many DOW loops in practice the sum statement is avoided and the SUM *function* is used instead. Thus, the assignment statement looks a bit less elegant, but the advantage is that you can omit the initialization, because the SUM function does not imply a RETAIN. It shares with the sum statement the desired property of "missing plus x equals x" (unlike an assignment such as count=count+1).

There is only one case where the results are different: If all values added are missing, the sum statement returns 0 (due to the implicit initialization), whereas the SUM function returns a missing value, which is actually the more accurate result in many cases. (The "Missing values were generated ..." notes in the log are the downside.)

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-23-2016 03:33 PM

@FreelanceReinhard. Thanks. This helped to see what is going inside one iteration of DOW LOOP.

```
data b;
do until ( last.id );
put _all_;
set a;
put _all_;
by id;
put _all_;
count=sum(count,1);
put _all_;
sum=sum(sum,var);
put _all_;
end;
mean = sum / count;
run;
```