Solved: Re: How does this work? Weird DO loop.

StacieOsborne · Posted 09-17-2022 09:17 AM

Hello, SAS Gurus!

Could you please help me?
First, I couldn’t understand what this (given, not mine) piece of code accomplishes. I got an explanation that the goal was to delete previous observations of time series until the beginning of the series for that case_nbr. Let’s say, we are talking about 2022, January to September. The variable reason can be either a character, like “P” or “22” or blank. We do not like blanks and try to find the original reason as of January 2022. If it’s not there, we’d rather start the series with the first record of non-blank reason, even if that’s August 2022.

/* I am not sure what this thing does... */
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set W ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
if _n_ then output ;
end;
run;

I have seen and used:
If first.var
If last.var
Do while
Do until
Do _n_ = 1 by 1 until(eof)
and so on. But never do _n_= 0 by 0 until(last.var ) ;

Quentin · Posted 09-17-2022 09:56 AM

Hi,

Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.

This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.

At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.

Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".

At the bottom of the loop, if _N_=1, it outputs the record.

The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:

data have ;
  input case_nbr recno reason $3. ;
  cards ;
1 1    
1 2    
1 3 foo
1 4    
2 1 foo
2 2    
2 3    
2 4    
3 1    
3 2    
3 3    
3 4 foo
4 1    
4 2    
4 3    
4 4    
;
run ;


data want ;
  do _n_= 0 by 0 until(last.case_nbr) ;
    set have ;
    by case_nbr ;
    if reason ne '' then _n_ = 1 ;
    put (case_nbr recno reason _n_)(=) ;
    if _n_ then output ;
  end;
run;

The log is:

24   data want ;
25     do _n_= 0 by 0 until(last.case_nbr) ;
26       set have ;
27       by case_nbr ;
28       if reason ne '' then _n_ = 1 ;
29       put (case_nbr recno reason _n_)(=) ;
30       if _n_ then output ;
31     end;
32   run;

case_nbr=1 recno=1 reason=  _N_=0
case_nbr=1 recno=2 reason=  _N_=0
case_nbr=1 recno=3 reason=foo _N_=1
case_nbr=1 recno=4 reason=  _N_=1
case_nbr=2 recno=1 reason=foo _N_=1
case_nbr=2 recno=2 reason=  _N_=1
case_nbr=2 recno=3 reason=  _N_=1
case_nbr=2 recno=4 reason=  _N_=1
case_nbr=3 recno=1 reason=  _N_=0
case_nbr=3 recno=2 reason=  _N_=0
case_nbr=3 recno=3 reason=  _N_=0
case_nbr=3 recno=4 reason=foo _N_=1
case_nbr=4 recno=1 reason=  _N_=0
case_nbr=4 recno=2 reason=  _N_=0
case_nbr=4 recno=3 reason=  _N_=0
case_nbr=4 recno=4 reason=  _N_=0
NOTE: There were 16 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.

If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.

If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:

data want ;
  set have ;
  by case_nbr ;
  retain flag ;
  if first.case_nbr then flag=0 ;
  if reason ne '' then flag=1 ;
  if flag=1 ;
run ;

HTH!

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

View solution in original post

Kurt_Bremser · Posted 09-17-2022 09:48 AM

The code reuses the automatic variable _N_ as a logical flag (one does not need to drop automatic variables).

Since there is no increment (BY 0), the variable stays at 0 throughout the loop, until a non-missing REASON is encountered. The loop terminates with last.case_nbr.

After the loop, the(Boolean) value is used to control the output.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Quentin · Posted 09-17-2022 09:56 AM

Hi,

Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.

This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.

At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.

Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".

At the bottom of the loop, if _N_=1, it outputs the record.

The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:

data have ;
  input case_nbr recno reason $3. ;
  cards ;
1 1    
1 2    
1 3 foo
1 4    
2 1 foo
2 2    
2 3    
2 4    
3 1    
3 2    
3 3    
3 4 foo
4 1    
4 2    
4 3    
4 4    
;
run ;


data want ;
  do _n_= 0 by 0 until(last.case_nbr) ;
    set have ;
    by case_nbr ;
    if reason ne '' then _n_ = 1 ;
    put (case_nbr recno reason _n_)(=) ;
    if _n_ then output ;
  end;
run;

The log is:

24   data want ;
25     do _n_= 0 by 0 until(last.case_nbr) ;
26       set have ;
27       by case_nbr ;
28       if reason ne '' then _n_ = 1 ;
29       put (case_nbr recno reason _n_)(=) ;
30       if _n_ then output ;
31     end;
32   run;

case_nbr=1 recno=1 reason=  _N_=0
case_nbr=1 recno=2 reason=  _N_=0
case_nbr=1 recno=3 reason=foo _N_=1
case_nbr=1 recno=4 reason=  _N_=1
case_nbr=2 recno=1 reason=foo _N_=1
case_nbr=2 recno=2 reason=  _N_=1
case_nbr=2 recno=3 reason=  _N_=1
case_nbr=2 recno=4 reason=  _N_=1
case_nbr=3 recno=1 reason=  _N_=0
case_nbr=3 recno=2 reason=  _N_=0
case_nbr=3 recno=3 reason=  _N_=0
case_nbr=3 recno=4 reason=foo _N_=1
case_nbr=4 recno=1 reason=  _N_=0
case_nbr=4 recno=2 reason=  _N_=0
case_nbr=4 recno=3 reason=  _N_=0
case_nbr=4 recno=4 reason=  _N_=0
NOTE: There were 16 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.

If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.

If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:

data want ;
  set have ;
  by case_nbr ;
  retain flag ;
  if first.case_nbr then flag=0 ;
  if reason ne '' then flag=1 ;
  if flag=1 ;
run ;

HTH!

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

Kurt_Bremser · Posted 09-17-2022 10:39 AM

I see I did not realize that the OUTPUT was part of the loop. So it writes all observations for a group after the first non-missing value is encountered.
Please use proper indentation in your code, and post it into a window opened with the „little running man“ button.
Spaghetti code will always get people in trouble.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PGStats · Posted 09-17-2022 02:12 PM

Using tricks is fun but can be obfuscating...

/* Easier to see what this thing does... */
data want ;
do until(last.case_nbr) ;
	set W ; by case_nbr ;
	if not missing(reason) then started = 1 ;
	if started then output ;
	end ;
drop started ;
run ;

PG

SASKiwi · Posted 09-17-2022 07:45 PM

@PGStats - Exactly. If you work in a team who is maintaining a suite of SAS applications then you should always (try to) use coding techniques that are obvious and easy to understand by others with less SAS experience. Just because you can code an esoteric method doesn't mean that you should. If you are coding just for yourself then that's fine, but not if you expect others to easily follow what you have done.

Quentin · Posted 09-18-2022 12:26 PM

The hard part is to decide what qualifies as esoteric. To some, the DOW loop itself is esoteric. Hashing may be seen as esoteric. Perhaps regular expressions...

While I personally don't write code that hijacks _n_, that's partly because I don't work with data where the efficiency gain of avoiding introducing a single variable into the PDV would be valuable to me. But I don't see the inherited code as unreasonable.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

SASKiwi · Posted 09-18-2022 04:01 PM

@Quentin - @PGStats 's version is reasonably acceptable in my view, the OPs's not so much 🙂 ...

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away