- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Could you please help me?
First, I couldn’t understand what this (given, not mine) piece of code accomplishes. I got an explanation that the goal was to delete previous observations of time series until the beginning of the series for that case_nbr. Let’s say, we are talking about 2022, January to September. The variable reason can be either a character, like “P” or “22” or blank. We do not like blanks and try to find the original reason as of January 2022. If it’s not there, we’d rather start the series with the first record of non-blank reason, even if that’s August 2022.
/* I am not sure what this thing does... */
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set W ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
if _n_ then output ;
end;
run;
I have seen and used:
If first.var
If last.var
Do while
Do until
Do _n_ = 1 by 1 until(eof)
and so on. But never do _n_= 0 by 0 until(last.var ) ;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.
This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.
At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.
Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".
At the bottom of the loop, if _N_=1, it outputs the record.
The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:
data have ;
input case_nbr recno reason $3. ;
cards ;
1 1
1 2
1 3 foo
1 4
2 1 foo
2 2
2 3
2 4
3 1
3 2
3 3
3 4 foo
4 1
4 2
4 3
4 4
;
run ;
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set have ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
put (case_nbr recno reason _n_)(=) ;
if _n_ then output ;
end;
run;
The log is:
24 data want ; 25 do _n_= 0 by 0 until(last.case_nbr) ; 26 set have ; 27 by case_nbr ; 28 if reason ne '' then _n_ = 1 ; 29 put (case_nbr recno reason _n_)(=) ; 30 if _n_ then output ; 31 end; 32 run; case_nbr=1 recno=1 reason= _N_=0 case_nbr=1 recno=2 reason= _N_=0 case_nbr=1 recno=3 reason=foo _N_=1 case_nbr=1 recno=4 reason= _N_=1 case_nbr=2 recno=1 reason=foo _N_=1 case_nbr=2 recno=2 reason= _N_=1 case_nbr=2 recno=3 reason= _N_=1 case_nbr=2 recno=4 reason= _N_=1 case_nbr=3 recno=1 reason= _N_=0 case_nbr=3 recno=2 reason= _N_=0 case_nbr=3 recno=3 reason= _N_=0 case_nbr=3 recno=4 reason=foo _N_=1 case_nbr=4 recno=1 reason= _N_=0 case_nbr=4 recno=2 reason= _N_=0 case_nbr=4 recno=3 reason= _N_=0 case_nbr=4 recno=4 reason= _N_=0 NOTE: There were 16 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 7 observations and 3 variables.
If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.
If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:
data want ;
set have ;
by case_nbr ;
retain flag ;
if first.case_nbr then flag=0 ;
if reason ne '' then flag=1 ;
if flag=1 ;
run ;
HTH!
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The code reuses the automatic variable _N_ as a logical flag (one does not need to drop automatic variables).
Since there is no increment (BY 0), the variable stays at 0 throughout the loop, until a non-missing REASON is encountered. The loop terminates with last.case_nbr.
After the loop, the(Boolean) value is used to control the output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for posting the full code here. Much easier to discuss here than on LinkedIn, and more folks to join in the fun.
This is an interesting use of the DOW-loop construct. This is an unofficial name for explicitly looping over a SET statement, often in by groups. Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.
At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr. And they use BY 0 because they don't want the loop iteration to increment _N_.
Inside the loop it reads each record within the case_nbr. If it finds a record with a record where REASON is not missing, it sets _N_=1. So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".
At the bottom of the loop, if _N_=1, it outputs the record.
The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON. That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:
data have ;
input case_nbr recno reason $3. ;
cards ;
1 1
1 2
1 3 foo
1 4
2 1 foo
2 2
2 3
2 4
3 1
3 2
3 3
3 4 foo
4 1
4 2
4 3
4 4
;
run ;
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set have ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
put (case_nbr recno reason _n_)(=) ;
if _n_ then output ;
end;
run;
The log is:
24 data want ; 25 do _n_= 0 by 0 until(last.case_nbr) ; 26 set have ; 27 by case_nbr ; 28 if reason ne '' then _n_ = 1 ; 29 put (case_nbr recno reason _n_)(=) ; 30 if _n_ then output ; 31 end; 32 run; case_nbr=1 recno=1 reason= _N_=0 case_nbr=1 recno=2 reason= _N_=0 case_nbr=1 recno=3 reason=foo _N_=1 case_nbr=1 recno=4 reason= _N_=1 case_nbr=2 recno=1 reason=foo _N_=1 case_nbr=2 recno=2 reason= _N_=1 case_nbr=2 recno=3 reason= _N_=1 case_nbr=2 recno=4 reason= _N_=1 case_nbr=3 recno=1 reason= _N_=0 case_nbr=3 recno=2 reason= _N_=0 case_nbr=3 recno=3 reason= _N_=0 case_nbr=3 recno=4 reason=foo _N_=1 case_nbr=4 recno=1 reason= _N_=0 case_nbr=4 recno=2 reason= _N_=0 case_nbr=4 recno=3 reason= _N_=0 case_nbr=4 recno=4 reason= _N_=0 NOTE: There were 16 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 7 observations and 3 variables.
If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3. Case 2 has all records output. Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.
If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:
data want ;
set have ;
by case_nbr ;
retain flag ;
if first.case_nbr then flag=0 ;
if reason ne '' then flag=1 ;
if flag=1 ;
run ;
HTH!
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please use proper indentation in your code, and post it into a window opened with the „little running man“ button.
Spaghetti code will always get people in trouble.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using tricks is fun but can be obfuscating...
/* Easier to see what this thing does... */
data want ;
do until(last.case_nbr) ;
set W ; by case_nbr ;
if not missing(reason) then started = 1 ;
if started then output ;
end ;
drop started ;
run ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@PGStats - Exactly. If you work in a team who is maintaining a suite of SAS applications then you should always (try to) use coding techniques that are obvious and easy to understand by others with less SAS experience. Just because you can code an esoteric method doesn't mean that you should. If you are coding just for yourself then that's fine, but not if you expect others to easily follow what you have done.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The hard part is to decide what qualifies as esoteric. To some, the DOW loop itself is esoteric. Hashing may be seen as esoteric. Perhaps regular expressions...
While I personally don't write code that hijacks _n_, that's partly because I don't work with data where the efficiency gain of avoiding introducing a single variable into the PDV would be valuable to me. But I don't see the inherited code as unreasonable.
Next up: Troy Martin Hughes presents Calling Open-Source Python Functions within SAS PROC FCMP: A Google Maps API Geocoding Adventure on Wednesday April 23.
Register now at https://www.basug.org/events.