BookmarkSubscribeRSS Feed
MigMaster12
Calcite | Level 5

Hello,

 

I have the following hypothetical data scenario:

 

MigMaster12_0-1610758375933.png

 

For each id, a diagnosis (value=1) is made after 2 consecutive "YES" hits, which can also be interspersed with missing results in between the 2 consecutive "YES" hits. 

 

If that criteria is met, then the diagnosis will be made at the first of the 2 sessions, and the diagnosis will be kept in all subsequent sessions regardless of the results.

 

I was thinking to achieve the diagnosis column I desire, a conditional DO loop with WHERE the first instance of Result="YES" and LAG(Result)="YES" can be used. But to my best knowledge of SAS, there isn't such a kind of DO loop? Only DO WHILE & DO UNTIL? And I'm stuck on how to dealing with the missing values that could be in between the consecutive "YES" hits. 

 

Can anyone chime in with suggestions/advice on how to achieve my desired outcome? Thank you!

4 REPLIES 4
mkeintz
PROC Star

Yes, you can do it with a (conditional) lag:

 

You haven't provided a working DATA step with your data, so this is not guaranteed to work:

 

data want (drop=i _:);
  do i=1 by 1 until (last.id);
    set have;
    by id;
    if result^=' ' then do;
      _last_result=lag(result);
      _last_id=lag(id);   /* ID for last non-missing result*/
      _last_i=lag(i);
      if result='YES' and _last_result=result and _last_id=id and _start_diag_1=. then _start_diag_1=_last_i;
    end;
  end;

  do i=1 by 1 until (last.id);
    set have;
    by id;
    diagnosis=ifn(_start_diag_1=. or i<_start_diag_1,0,1);
    output;
  end;
run;

Note the lag functions are only run when RESULT^=' ', so the lag(result) will never return a blank (that's what it means when SAS says the lag function is a FIFO queue).  

 

But if your data:

  1. Has at least 1 non-missing result for each ID
  2. Has a variable (like SESSION) that never has a duplicate within each ID

then this simpler program would be my preference.  It doesn't use lags, but the strategy is similar:

 

data want (drop=_:);

  do until (last.id);
    set have (where=(result^=' '));
    by id result notsorted;
    if _first_session^=. then continue;
    if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;
  end;

  diagnosis=0;
  do until (last.id);
    set have;
    by id;
    if session=_first_session then diagnosis=1;
    output;
  end;

run;

Because the first DO loop only read non-missing results, it merely has to look for consecutive 'YES' results.  Note th e

if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;

merely tests that the current record is a "YES" and it is not a singleton.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

You can simplify the first code I offered by reframing the code from using 2 do until (last.id) loops into a single such loop, by using the HAVE dataset mentioned twice in the set statement.  

 

data want (drop=_:);
  set have (in=firstpass) have (in=secondpass);
  by id;

  if first.id then call missing (_nr1,_nr2,_first_y_pair);
  if first.id then diagnosis=0;
  retain diagnosis _first_y_pair;

  _nr1+(firstpass=1 and result in ('YES','NO'));
  if _nr1^=lag(_nr1) then 
    if lag(id)=id and lag(result)='YES' and result='YES' and _first_y_pair=. then _first_y_pair=_nr1-1;

  if secondpass;
  _nr2+(result^=' ');

  if _nr2=_first_y_pair then diagnosis=1;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
MigMaster12
Calcite | Level 5

Hi there,

 

Thank you for your suggested solutions; the very last 1 you posted worked.

 

However, I do have another do-loop related question that I hope you won't mind taking a look at: if we are using a similar dataset as in this topic, but instead my desired output is where I want to execute a DO function only for the very 1st instance of a "YES" test result:

 

data have;
input id session result desired_output;
datalines;
1 4 yes 1
1 6 yes 0
1 7 no 0
1 9 yes 0
1 12 . 0
2 1 yes 1
2 2 . 0
2 7 . 0
2 8 yes 0
2 9 yes 0
;
run;

 

I tried both a DO UNTIL(result="no" or result=.) & a DO WHILE(result="yes"), desired_output=1 code, but that doesn't seem to work. 

 

Any ideas on how to limit the DO statement to just the 1st instance?

 

Thank you again!

PeterClemmensen
Tourmaline | Level 20

This problem is similar to the one posted in your thread SAS conditionally updating previous row's observation values so mu solution there will work here as well 🙂

 

data want(drop = i r);
   set have curobs = c;
   by id;

   if first.id then diagnosis = 0;

   if diagnosis = 0 and Result = "YES" then do p = (c + 1) by 1;
      if p > n then leave;
      set have (keep = id Result rename = (id = i Result = r)) point = p nobs = n;
      if id ne i then leave;
      if Result = "YES" and r = "NO" then leave;
      if Result = "YES" and r = "YES" then do;
         diagnosis = 1;
         leave;
      end;
   end;

   retain diagnosis;
run;

 

Result:

 

id session Result diagnosis 
1  1              0 
1  4       YES    1 
1  6       YES    1 
1  7       NO     1 
1  9       YES    1 
1  11      YES    1 
1  12             1 
2  1       YES    1 
2  2              1 
2  3              1 
2  7              1 
2  8       YES    1 
2  9       YES    1 
 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 816 views
  • 2 likes
  • 3 in conversation