Hi,
Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.
This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.
At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.
Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".
At the bottom of the loop, if _N_=1, it outputs the record.
The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:
data have ;
input case_nbr recno reason $3. ;
cards ;
1 1
1 2
1 3 foo
1 4
2 1 foo
2 2
2 3
2 4
3 1
3 2
3 3
3 4 foo
4 1
4 2
4 3
4 4
;
run ;
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set have ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
put (case_nbr recno reason _n_)(=) ;
if _n_ then output ;
end;
run;
The log is:
24 data want ; 25 do _n_= 0 by 0 until(last.case_nbr) ; 26 set have ; 27 by case_nbr ; 28 if reason ne '' then _n_ = 1 ; 29 put (case_nbr recno reason _n_)(=) ; 30 if _n_ then output ; 31 end; 32 run; case_nbr=1 recno=1 reason= _N_=0 case_nbr=1 recno=2 reason= _N_=0 case_nbr=1 recno=3 reason=foo _N_=1 case_nbr=1 recno=4 reason= _N_=1 case_nbr=2 recno=1 reason=foo _N_=1 case_nbr=2 recno=2 reason= _N_=1 case_nbr=2 recno=3 reason= _N_=1 case_nbr=2 recno=4 reason= _N_=1 case_nbr=3 recno=1 reason= _N_=0 case_nbr=3 recno=2 reason= _N_=0 case_nbr=3 recno=3 reason= _N_=0 case_nbr=3 recno=4 reason=foo _N_=1 case_nbr=4 recno=1 reason= _N_=0 case_nbr=4 recno=2 reason= _N_=0 case_nbr=4 recno=3 reason= _N_=0 case_nbr=4 recno=4 reason= _N_=0 NOTE: There were 16 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 7 observations and 3 variables.
If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.
If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:
data want ;
set have ;
by case_nbr ;
retain flag ;
if first.case_nbr then flag=0 ;
if reason ne '' then flag=1 ;
if flag=1 ;
run ;
HTH!
The code reuses the automatic variable _N_ as a logical flag (one does not need to drop automatic variables).
Since there is no increment (BY 0), the variable stays at 0 throughout the loop, until a non-missing REASON is encountered. The loop terminates with last.case_nbr.
After the loop, the(Boolean) value is used to control the output.
Hi,
Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.
This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.
At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.
Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".
At the bottom of the loop, if _N_=1, it outputs the record.
The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:
data have ;
input case_nbr recno reason $3. ;
cards ;
1 1
1 2
1 3 foo
1 4
2 1 foo
2 2
2 3
2 4
3 1
3 2
3 3
3 4 foo
4 1
4 2
4 3
4 4
;
run ;
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set have ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
put (case_nbr recno reason _n_)(=) ;
if _n_ then output ;
end;
run;
The log is:
24 data want ; 25 do _n_= 0 by 0 until(last.case_nbr) ; 26 set have ; 27 by case_nbr ; 28 if reason ne '' then _n_ = 1 ; 29 put (case_nbr recno reason _n_)(=) ; 30 if _n_ then output ; 31 end; 32 run; case_nbr=1 recno=1 reason= _N_=0 case_nbr=1 recno=2 reason= _N_=0 case_nbr=1 recno=3 reason=foo _N_=1 case_nbr=1 recno=4 reason= _N_=1 case_nbr=2 recno=1 reason=foo _N_=1 case_nbr=2 recno=2 reason= _N_=1 case_nbr=2 recno=3 reason= _N_=1 case_nbr=2 recno=4 reason= _N_=1 case_nbr=3 recno=1 reason= _N_=0 case_nbr=3 recno=2 reason= _N_=0 case_nbr=3 recno=3 reason= _N_=0 case_nbr=3 recno=4 reason=foo _N_=1 case_nbr=4 recno=1 reason= _N_=0 case_nbr=4 recno=2 reason= _N_=0 case_nbr=4 recno=3 reason= _N_=0 case_nbr=4 recno=4 reason= _N_=0 NOTE: There were 16 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 7 observations and 3 variables.
If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.
If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:
data want ;
set have ;
by case_nbr ;
retain flag ;
if first.case_nbr then flag=0 ;
if reason ne '' then flag=1 ;
if flag=1 ;
run ;
HTH!
Using tricks is fun but can be obfuscating...
/* Easier to see what this thing does... */
data want ;
do until(last.case_nbr) ;
set W ; by case_nbr ;
if not missing(reason) then started = 1 ;
if started then output ;
end ;
drop started ;
run ;
@PGStats - Exactly. If you work in a team who is maintaining a suite of SAS applications then you should always (try to) use coding techniques that are obvious and easy to understand by others with less SAS experience. Just because you can code an esoteric method doesn't mean that you should. If you are coding just for yourself then that's fine, but not if you expect others to easily follow what you have done.
The hard part is to decide what qualifies as esoteric. To some, the DOW loop itself is esoteric. Hashing may be seen as esoteric. Perhaps regular expressions...
While I personally don't write code that hijacks _n_, that's partly because I don't work with data where the efficiency gain of avoiding introducing a single variable into the PDV would be valuable to me. But I don't see the inherited code as unreasonable.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.