I have a long (i.e., not wide) dataset w/ 3 timepoints per participant:
baseline_arm_1 |
fu_1_arm_1 |
fu_2_arm_1 |
I want to create a subset that only includes those who have completed all 3 timepoints. In the image below, the orange rows would be included and the yellow rows would be excluded.
I've tried various version of subsetting using WHERE, IN, OR, AND, etc.
Seems like it should be pretty straight forward, but I can't figure it out.
Thanks.
It is complicated because you essentially need to process the data for each participant twice. Once to check if all three events occurred and a second time to decide whether to output them.
You could do that in a data step using two DO loops.
data want;
do until(last.participant);
set have;
by participant;
if redcap_event_name='baseline_arm_1' then baseline=1;
if redcap_event_name='fu_arm_1' then fu1=1;
if redcap_event_name='fu_arm_2' then fu2=1;
end;
do until(last.participant);
set have;
by participant;
if baseline and fu1 and fu2 then output;
end;
run;
/* UNTESTED CODE */
proc freq data=have;
tables record_id/out=counts;
run;
data want;
merge have counts;
by record_id;
if count=3;
run;
If you want tested code, please supply the data (not as a screen capture, we can't test code against a screen capture) but as SAS data step code, which you can type yourself or use these instructions. And you should provide data this way in your future questions.
proc sql;
create table want as
select * from have
where record_id in (select record_id from have group by record_id having count (distinctr edcap_event_name) =3)
order by 1, 2;
quit;
Assuming that the desired complete sets always occur in the sequence
baseline_arm_1
fu_1_arm_1
fu_1_arm_2
you can do a "lookahead" merge to determine whether the current record and the two subsequent records should be output (untested)::
data want (drop=_:);
merge have
have (firstobs=2 keep=redcap_event_name rename=(redcap_event_name=_nxt_evnt))
have (firstobs=3 keep=participant redcap_event_name rename=(participant=_nxt_part2 redcap_event_name=_nxt_evnt2));
retain _counter 0;
if _counter<=0 then do; /*If not already in a threesome, then check the lookaheads */
if redcap_event_name='baseline_arm_1' and _nxt_evnt='fu_1_arm_1' and _nxt_evnt2='fu_2_arm_2'
and _nxtpart2=participant then _counter=3;
end;
if _counter>0;
_counter+(-1);
run;
Editted note: slightly cleaner code below:
data want (drop=_:);
merge have
have (firstobs=2 keep=redcap_event_name rename=(redcap_event_name=_nxt_evnt))
have (firstobs=3 keep=participant redcap_event_name rename=(participant=_nxt_part2 redcap_event_name=_nxt_evnt2));
retain _counter 0;
if redcap_event_name='baseline_arm_1' then do; /*If this is baseline, check the lookaheads */
if _nxt_evnt='fu_1_arm_1' and _nxt_evnt2='fu_2_arm_2' and _nxtpart2=participant then _counter=3;
end;
if _counter>0;
_counter+(-1);
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.