Hello SAS users, I am working on a longitudinal study data where I need to code missing value indicator for an outcome. So for each person ID, the outcome was measured several times throughout different time point. If the outcome is not missing, we say the outcome is observed (Indicator="O"), if some of the outcomes are missing but later the outcome appeared within the same person ID, then we say the outcome is intermittently missing (Indicator="I"). If the outcomes are missing at one point till the end of the person ID, we say this person is dropout (Indicator="D"). Here is how the data looks like:
ID | Time | Outcome | Indicator |
1 | 3 | 2 | O |
1 | 6 | . | I |
1 | 9 | 3 | O |
1 | 12 | 1 | O |
1 | 15 | . | I |
1 | 18 | 1 | O |
2 | 3 | 2 | O |
2 | 6 | 3 | O |
2 | 9 | . | I |
2 | 12 | . | I |
2 | 15 | 3 | O |
2 | 18 | 2 | O |
3 | 3 | 1 | O |
3 | 6 | 1 | O |
3 | 9 | . | D |
3 | 12 | . | D |
3 | 15 | . | D |
3 | 18 | . | D |
4 | 3 | 3 | O |
4 | 6 | . | I |
4 | 9 | 2 | O |
4 | 12 | . | D |
4 | 15 | . | D |
4 | 18 | . | D |
Would you mind helping me figure a way to code the indicator variable that contains "observed", "Intermittently missing", "Dropout"? The original data only has ID, time, and outcome; I added the indicator variable for the demonstration.
Thank you.
Like this?
data HAVE;
input ID TIME OUTCOME ;
cards;
1 3 2 O
1 6 . I
1 9 3 O
1 12 1 O
1 15 . I
1 18 1 O
2 3 2 O
2 6 3 O
2 9 . I
2 12 . I
2 15 3 O
2 18 2 O
3 3 1 O
3 6 1 O
3 9 . D
3 12 . D
3 15 . D
3 18 . D
4 3 3 O
4 6 . I
4 9 2 O
4 12 . D
4 15 . D
4 18 . D
run;
proc sql nothreads;
create table LASTOUTCOME as
select ID, max(N*(OUTCOME>0)) as LASTOUTCOME
from (select *, monotonic() as N from HAVE)
group by ID
order by ID ;
quit;
data WANT;
merge HAVE LASTOUTCOME;
by ID;
length INDICATOR $1;
INDICATOR=ifc(OUTCOME,'O', ifc(_N_<=LASTOUTCOME ,'I', 'D' ));
run;
ID | TIME | OUTCOME | LASTOUTCOME | INDICATOR |
---|---|---|---|---|
1 | 3 | 2 | 6 | O |
1 | 6 | . | 6 | I |
1 | 9 | 3 | 6 | O |
1 | 12 | 1 | 6 | O |
1 | 15 | . | 6 | I |
1 | 18 | 1 | 6 | O |
2 | 3 | 2 | 12 | O |
2 | 6 | 3 | 12 | O |
2 | 9 | . | 12 | I |
2 | 12 | . | 12 | I |
2 | 15 | 3 | 12 | O |
2 | 18 | 2 | 12 | O |
3 | 3 | 1 | 14 | O |
3 | 6 | 1 | 14 | O |
3 | 9 | . | 14 | D |
3 | 12 | . | 14 | D |
3 | 15 | . | 14 | D |
3 | 18 | . | 14 | D |
4 | 3 | 3 | 21 | O |
4 | 6 | . | 21 | I |
4 | 9 | 2 | 21 | O |
4 | 12 | . | 21 | D |
4 | 15 | . | 21 | D |
4 | 18 | . | 21 | D |
Read all records for each ID twice:
This is untested, in the absence of a working DATA step with sample data.
data want (drop=i last_observed);
do i=1 by 1 until (last.id);
set have;
by id;
if outcome^=. then last_observed=i;
end;
do i=1 by 1 until (last.id);
if i>last_observed then indicator='D';
else if missing(outcome) then indicator='I';
else indicator='O';
output;
end;
run;
Like this?
data HAVE;
input ID TIME OUTCOME ;
cards;
1 3 2 O
1 6 . I
1 9 3 O
1 12 1 O
1 15 . I
1 18 1 O
2 3 2 O
2 6 3 O
2 9 . I
2 12 . I
2 15 3 O
2 18 2 O
3 3 1 O
3 6 1 O
3 9 . D
3 12 . D
3 15 . D
3 18 . D
4 3 3 O
4 6 . I
4 9 2 O
4 12 . D
4 15 . D
4 18 . D
run;
proc sql nothreads;
create table LASTOUTCOME as
select ID, max(N*(OUTCOME>0)) as LASTOUTCOME
from (select *, monotonic() as N from HAVE)
group by ID
order by ID ;
quit;
data WANT;
merge HAVE LASTOUTCOME;
by ID;
length INDICATOR $1;
INDICATOR=ifc(OUTCOME,'O', ifc(_N_<=LASTOUTCOME ,'I', 'D' ));
run;
ID | TIME | OUTCOME | LASTOUTCOME | INDICATOR |
---|---|---|---|---|
1 | 3 | 2 | 6 | O |
1 | 6 | . | 6 | I |
1 | 9 | 3 | 6 | O |
1 | 12 | 1 | 6 | O |
1 | 15 | . | 6 | I |
1 | 18 | 1 | 6 | O |
2 | 3 | 2 | 12 | O |
2 | 6 | 3 | 12 | O |
2 | 9 | . | 12 | I |
2 | 12 | . | 12 | I |
2 | 15 | 3 | 12 | O |
2 | 18 | 2 | 12 | O |
3 | 3 | 1 | 14 | O |
3 | 6 | 1 | 14 | O |
3 | 9 | . | 14 | D |
3 | 12 | . | 14 | D |
3 | 15 | . | 14 | D |
3 | 18 | . | 14 | D |
4 | 3 | 3 | 21 | O |
4 | 6 | . | 21 | I |
4 | 9 | 2 | 21 | O |
4 | 12 | . | 21 | D |
4 | 15 | . | 21 | D |
4 | 18 | . | 21 | D |
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.