Hello,
I have the following hypothetical data scenario:
For each id, a diagnosis (value=1) is made after 2 consecutive "YES" hits, which can also be interspersed with missing results in between the 2 consecutive "YES" hits.
If that criteria is met, then the diagnosis will be made at the first of the 2 sessions, and the diagnosis will be kept in all subsequent sessions regardless of the results.
I was thinking to achieve the diagnosis column I desire, a conditional DO loop with WHERE the first instance of Result="YES" and LAG(Result)="YES" can be used. But to my best knowledge of SAS, there isn't such a kind of DO loop? Only DO WHILE & DO UNTIL? And I'm stuck on how to dealing with the missing values that could be in between the consecutive "YES" hits.
Can anyone chime in with suggestions/advice on how to achieve my desired outcome? Thank you!
Yes, you can do it with a (conditional) lag:
You haven't provided a working DATA step with your data, so this is not guaranteed to work:
data want (drop=i _:);
do i=1 by 1 until (last.id);
set have;
by id;
if result^=' ' then do;
_last_result=lag(result);
_last_id=lag(id); /* ID for last non-missing result*/
_last_i=lag(i);
if result='YES' and _last_result=result and _last_id=id and _start_diag_1=. then _start_diag_1=_last_i;
end;
end;
do i=1 by 1 until (last.id);
set have;
by id;
diagnosis=ifn(_start_diag_1=. or i<_start_diag_1,0,1);
output;
end;
run;
Note the lag functions are only run when RESULT^=' ', so the lag(result) will never return a blank (that's what it means when SAS says the lag function is a FIFO queue).
But if your data:
then this simpler program would be my preference. It doesn't use lags, but the strategy is similar:
data want (drop=_:);
do until (last.id);
set have (where=(result^=' '));
by id result notsorted;
if _first_session^=. then continue;
if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;
end;
diagnosis=0;
do until (last.id);
set have;
by id;
if session=_first_session then diagnosis=1;
output;
end;
run;
Because the first DO loop only read non-missing results, it merely has to look for consecutive 'YES' results. Note th e
if result='YES' and not (first.result=1 and last.result=1) then _first_session=session;
merely tests that the current record is a "YES" and it is not a singleton.
You can simplify the first code I offered by reframing the code from using 2 do until (last.id) loops into a single such loop, by using the HAVE dataset mentioned twice in the set statement.
data want (drop=_:);
set have (in=firstpass) have (in=secondpass);
by id;
if first.id then call missing (_nr1,_nr2,_first_y_pair);
if first.id then diagnosis=0;
retain diagnosis _first_y_pair;
_nr1+(firstpass=1 and result in ('YES','NO'));
if _nr1^=lag(_nr1) then
if lag(id)=id and lag(result)='YES' and result='YES' and _first_y_pair=. then _first_y_pair=_nr1-1;
if secondpass;
_nr2+(result^=' ');
if _nr2=_first_y_pair then diagnosis=1;
run;
Hi there,
Thank you for your suggested solutions; the very last 1 you posted worked.
However, I do have another do-loop related question that I hope you won't mind taking a look at: if we are using a similar dataset as in this topic, but instead my desired output is where I want to execute a DO function only for the very 1st instance of a "YES" test result:
data have;
input id session result desired_output;
datalines;
1 4 yes 1
1 6 yes 0
1 7 no 0
1 9 yes 0
1 12 . 0
2 1 yes 1
2 2 . 0
2 7 . 0
2 8 yes 0
2 9 yes 0
;
run;
I tried both a DO UNTIL(result="no" or result=.) & a DO WHILE(result="yes"), desired_output=1 code, but that doesn't seem to work.
Any ideas on how to limit the DO statement to just the 1st instance?
Thank you again!
This problem is similar to the one posted in your thread SAS conditionally updating previous row's observation values so mu solution there will work here as well 🙂
data want(drop = i r);
set have curobs = c;
by id;
if first.id then diagnosis = 0;
if diagnosis = 0 and Result = "YES" then do p = (c + 1) by 1;
if p > n then leave;
set have (keep = id Result rename = (id = i Result = r)) point = p nobs = n;
if id ne i then leave;
if Result = "YES" and r = "NO" then leave;
if Result = "YES" and r = "YES" then do;
diagnosis = 1;
leave;
end;
end;
retain diagnosis;
run;
Result:
id session Result diagnosis 1 1 0 1 4 YES 1 1 6 YES 1 1 7 NO 1 1 9 YES 1 1 11 YES 1 1 12 1 2 1 YES 1 2 2 1 2 3 1 2 7 1 2 8 YES 1 2 9 YES 1
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.