BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
StacieOsborne
Calcite | Level 5
Hello, SAS Gurus!

Could you please help me?
First, I couldn’t understand what this (given, not mine) piece of code accomplishes. I got an explanation that the goal was to delete previous observations of time series until the beginning of the series for that case_nbr. Let’s say, we are talking about 2022, January to September. The variable reason can be either a character, like “P” or “22” or blank. We do not like blanks and try to find the original reason as of January 2022. If it’s not there, we’d rather start the series with the first record of non-blank reason, even if that’s August 2022.

/* I am not sure what this thing does... */
data want ;
do _n_= 0 by 0 until(last.case_nbr) ;
set W ;
by case_nbr ;
if reason ne '' then _n_ = 1 ;
if _n_ then output ;
end;
run;

I have seen and used:
If first.var
If last.var
Do while
Do until
Do _n_ = 1 by 1 until(eof)
and so on. But never do _n_= 0 by 0 until(last.var ) ;
1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

Hi,

 

Thanks for posting the full code here.  Much easier to discuss here than on LinkedIn, and more folks to join in the fun.

 

This is an interesting use of the DOW-loop construct.  This is an unofficial name for explicitly looping over a SET statement, often in by groups.  Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.

 

At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr.  And they use BY 0 because they don't want the loop iteration to increment _N_.

 

Inside the loop it reads each record within the case_nbr.  If it finds a record with a record where REASON is not missing, it sets _N_=1.  So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".

 

At the bottom of the loop, if _N_=1, it outputs the record. 

 

The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON.  That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:

 

data have ;
  input case_nbr recno reason $3. ;
  cards ;
1 1    
1 2    
1 3 foo
1 4    
2 1 foo
2 2    
2 3    
2 4    
3 1    
3 2    
3 3    
3 4 foo
4 1    
4 2    
4 3    
4 4    
;
run ;


data want ;
  do _n_= 0 by 0 until(last.case_nbr) ;
    set have ;
    by case_nbr ;
    if reason ne '' then _n_ = 1 ;
    put (case_nbr recno reason _n_)(=) ;
    if _n_ then output ;
  end;
run;

The log is:

24   data want ;
25     do _n_= 0 by 0 until(last.case_nbr) ;
26       set have ;
27       by case_nbr ;
28       if reason ne '' then _n_ = 1 ;
29       put (case_nbr recno reason _n_)(=) ;
30       if _n_ then output ;
31     end;
32   run;

case_nbr=1 recno=1 reason=  _N_=0
case_nbr=1 recno=2 reason=  _N_=0
case_nbr=1 recno=3 reason=foo _N_=1
case_nbr=1 recno=4 reason=  _N_=1
case_nbr=2 recno=1 reason=foo _N_=1
case_nbr=2 recno=2 reason=  _N_=1
case_nbr=2 recno=3 reason=  _N_=1
case_nbr=2 recno=4 reason=  _N_=1
case_nbr=3 recno=1 reason=  _N_=0
case_nbr=3 recno=2 reason=  _N_=0
case_nbr=3 recno=3 reason=  _N_=0
case_nbr=3 recno=4 reason=foo _N_=1
case_nbr=4 recno=1 reason=  _N_=0
case_nbr=4 recno=2 reason=  _N_=0
case_nbr=4 recno=3 reason=  _N_=0
case_nbr=4 recno=4 reason=  _N_=0
NOTE: There were 16 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.

 

If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3.  Case 2 has all records output.  Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.

 

If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:

 

data want ;
  set have ;
  by case_nbr ;
  retain flag ;
  if first.case_nbr then flag=0 ;
  if reason ne '' then flag=1 ;
  if flag=1 ;
run ;

 

HTH!

 

 

 

 

 

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

The code reuses the automatic variable _N_ as a logical flag (one does not need to drop automatic variables).

Since there is no increment (BY 0), the variable stays at 0 throughout the loop, until a non-missing REASON is encountered. The loop terminates with last.case_nbr.

After the loop, the(Boolean) value is used to control the output.

Quentin
Super User

Hi,

 

Thanks for posting the full code here.  Much easier to discuss here than on LinkedIn, and more folks to join in the fun.

 

This is an interesting use of the DOW-loop construct.  This is an unofficial name for explicitly looping over a SET statement, often in by groups.  Paul Dorfman wrote a classic paper on DOW-loop: https://www.lexjansen.com/nesug/nesug08/hw/hw02.pdf.

 

At the top of the the DOW-loop, they set _N_ to 0, and tell the DOW loop to keep looping until it has read the last record for a case_nbr.  And they use BY 0 because they don't want the loop iteration to increment _N_.

 

Inside the loop it reads each record within the case_nbr.  If it finds a record with a record where REASON is not missing, it sets _N_=1.  So _N_ is being used as a yes/no flag that means "I have found a record with non-missing reason for this case_nbr".

 

At the bottom of the loop, if _N_=1, it outputs the record. 

 

The end result of this logic is that for each case_nbr, it will start outputting records when it finds the first record with non-missing REASON.  That's hard for me to say in words, here's some example data and code, and I added a PUT statement to your code:

 

data have ;
  input case_nbr recno reason $3. ;
  cards ;
1 1    
1 2    
1 3 foo
1 4    
2 1 foo
2 2    
2 3    
2 4    
3 1    
3 2    
3 3    
3 4 foo
4 1    
4 2    
4 3    
4 4    
;
run ;


data want ;
  do _n_= 0 by 0 until(last.case_nbr) ;
    set have ;
    by case_nbr ;
    if reason ne '' then _n_ = 1 ;
    put (case_nbr recno reason _n_)(=) ;
    if _n_ then output ;
  end;
run;

The log is:

24   data want ;
25     do _n_= 0 by 0 until(last.case_nbr) ;
26       set have ;
27       by case_nbr ;
28       if reason ne '' then _n_ = 1 ;
29       put (case_nbr recno reason _n_)(=) ;
30       if _n_ then output ;
31     end;
32   run;

case_nbr=1 recno=1 reason=  _N_=0
case_nbr=1 recno=2 reason=  _N_=0
case_nbr=1 recno=3 reason=foo _N_=1
case_nbr=1 recno=4 reason=  _N_=1
case_nbr=2 recno=1 reason=foo _N_=1
case_nbr=2 recno=2 reason=  _N_=1
case_nbr=2 recno=3 reason=  _N_=1
case_nbr=2 recno=4 reason=  _N_=1
case_nbr=3 recno=1 reason=  _N_=0
case_nbr=3 recno=2 reason=  _N_=0
case_nbr=3 recno=3 reason=  _N_=0
case_nbr=3 recno=4 reason=foo _N_=1
case_nbr=4 recno=1 reason=  _N_=0
case_nbr=4 recno=2 reason=  _N_=0
case_nbr=4 recno=3 reason=  _N_=0
case_nbr=4 recno=4 reason=  _N_=0
NOTE: There were 16 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 7 observations and 3 variables.

 

If you look at _N_, you can see Case 1 will have records 3 and 4 output, because the first non-missing REASON is record 3.  Case 2 has all records output.  Case 3 has only its last record output. Case 4 will have no records output, because all records have a missing reason.

 

If you were doing this without a DOW-loop approach, you might use BY group processing and set a retained flag to select records, e.g.:

 

data want ;
  set have ;
  by case_nbr ;
  retain flag ;
  if first.case_nbr then flag=0 ;
  if reason ne '' then flag=1 ;
  if flag=1 ;
run ;

 

HTH!

 

 

 

 

 

Kurt_Bremser
Super User
I see I did not realize that the OUTPUT was part of the loop. So it writes all observations for a group after the first non-missing value is encountered.
Please use proper indentation in your code, and post it into a window opened with the „little running man“ button.
Spaghetti code will always get people in trouble.
PGStats
Opal | Level 21

Using tricks is fun but can be obfuscating...

 

/* Easier to see what this thing does... */
data want ;
do until(last.case_nbr) ;
	set W ; by case_nbr ;
	if not missing(reason) then started = 1 ;
	if started then output ;
	end ;
drop started ;
run ;
PG
SASKiwi
PROC Star

@PGStats - Exactly. If you work in a team who is maintaining a suite of SAS applications then you should always (try to) use coding techniques that are obvious and easy to understand by others with less SAS experience. Just because you can code an esoteric method doesn't mean that you should. If you are coding just for yourself then that's fine, but not if you expect others to easily follow what you have done.

Quentin
Super User

The hard part is to decide what qualifies as esoteric.  To some, the DOW loop itself is esoteric.  Hashing may be seen  as esoteric.  Perhaps regular expressions...  

 

While I personally don't write code that hijacks _n_, that's partly because I don't work with data where the efficiency gain of avoiding introducing a single variable into the PDV would be valuable to me.  But I don't see the inherited code as unreasonable.

 

SASKiwi
PROC Star

@Quentin - @PGStats 's version is reasonably acceptable in my view, the OPs's not so much 🙂 ...

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2070 views
  • 5 likes
  • 5 in conversation