BookmarkSubscribeRSS Feed
MigMaster12
Calcite | Level 5

Hello,

 

I have the following hypothetical data scenario:

 

MigMaster12_0-1610758375933.png

 

For each id, a diagnosis (value=1) is made after 2 consecutive "YES" hits, which can also be interspersed with missing results in between the 2 consecutive "YES" hits. 

 

If that criteria is met, then the diagnosis will be made at the first of the 2 sessions, and the diagnosis will be kept in all subsequent sessions regardless of the results.

 

I was thinking to achieve the diagnosis column I desire, a conditional DO loop with WHERE the first instance of Result="YES" and LAG(Result)="YES" can be used. But to my best knowledge of SAS, there isn't such a kind of DO loop? Only DO WHILE & DO UNTIL? And I'm stuck on how to dealing with the missing values that could be in between the consecutive "YES" hits. 

 

Can anyone chime in with suggestions/advice on how to achieve my desired outcome? Thank you!

4 REPLIES 4
mkeintz
PROC Star

Yes, you can do it with a (conditional) lag:

 

You haven't provided a working DATA step with your data, so this is not guaranteed to work:

 

data want (drop=i _:);
  do i=1 by 1 until (last.id);
    set have;
    by id;
    if result^=' ' then do;
      _last_result=lag(result);
      _last_id=lag(id);   /* ID for last non-missing result*/
      _last_i=lag(i);
      if result='YES' and _last_result=result and _last_id=id and _start_diag_1=. then _start_diag_1=_last_i;
    end;
  end;

  do i=1 by 1 until (last.id);
    set have;
    by id;
    diagnosis=ifn(_start_diag_1=. or i<_start_diag_1,0,1);
    output;
  end;
run;

Note the lag functions are only run when RESULT^=' ', so the lag(result) will never return a blank (that's what it means when SAS says the lag function is a FIFO queue).  

 

But if your data:

  1. Has at least 1 non-missing result for each ID
  2. Has a variable (like SESSION) that never has a duplicate within each ID

then this simpler program would be my preference.  It doesn't use lags, but the strategy is similar:

 

data want (drop=_:);

  do until (last.id);
    set have (where=(result^=' '));
    by id result notsorted;
    if _first_session^=. then continue;
    if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;
  end;

  diagnosis=0;
  do until (last.id);
    set have;
    by id;
    if session=_first_session then diagnosis=1;
    output;
  end;

run;

Because the first DO loop only read non-missing results, it merely has to look for consecutive 'YES' results.  Note th e

if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;

merely tests that the current record is a "YES" and it is not a singleton.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

You can simplify the first code I offered by reframing the code from using 2 do until (last.id) loops into a single such loop, by using the HAVE dataset mentioned twice in the set statement.  

 

data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id;

  if first.id then call missing (_nr1,_nr2,_first_y_pair);
  if first.id then diagnosis=0;
  retain diagnosis _first_y_pair;

  _nr1+(firstpass=1 and result in ('YES','NO'));
  if _nr1^=lag(_nr1) then 
    if lag(id)=id and lag(result)='YES' and result='YES' and _first_y_pair=. then _first_y_pair=_nr1-1;

  if secondpass;
  _nr2+(result^=' ');

  if _nr2=_first_y_pair then diagnosis=1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MigMaster12
Calcite | Level 5

Hi there,

 

Thank you for your suggested solutions; the very last 1 you posted worked.

 

However, I do have another do-loop related question that I hope you won't mind taking a look at: if we are using a similar dataset as in this topic, but instead my desired output is where I want to execute a DO function only for the very 1st instance of a "YES" test result:

 

data have;
input id session result desired_output;
datalines;
1 4 yes 1
1 6 yes 0
1 7 no 0
1 9 yes 0
1 12 . 0
2 1 yes 1
2 2 . 0
2 7 . 0
2 8 yes 0
2 9 yes 0
;
run;

 

I tried both a DO UNTIL(result="no" or result=.) & a DO WHILE(result="yes"), desired_output=1 code, but that doesn't seem to work. 

 

Any ideas on how to limit the DO statement to just the 1st instance?

 

Thank you again!

PeterClemmensen
Tourmaline | Level 20

This problem is similar to the one posted in your thread SAS conditionally updating previous row's observation values so mu solution there will work here as well 🙂

 

data want(drop = i r);
   set have curobs = c;
   by id;

   if first.id then diagnosis = 0;

   if diagnosis = 0 and Result = "YES" then do p = (c + 1) by 1;
      if p > n then leave;
      set have (keep = id Result rename = (id = i Result = r)) point = p nobs = n;
      if id ne i then leave;
      if Result = "YES" and r = "NO" then leave;
      if Result = "YES" and r = "YES" then do;
         diagnosis = 1;
         leave;
      end;
   end;

   retain diagnosis;
run;

 

Result:

 

id session Result diagnosis 
1  1              0 
1  4       YES    1 
1  6       YES    1 
1  7       NO     1 
1  9       YES    1 
1  11      YES    1 
1  12             1 
2  1       YES    1 
2  2              1 
2  3              1 
2  7              1 
2  8       YES    1 
2  9       YES    1 
 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1678 views
  • 2 likes
  • 3 in conversation