BookmarkSubscribeRSS Feed
_maldini_
Barite | Level 11

I have a long (i.e., not wide) dataset w/ 3 timepoints per participant:

baseline_arm_1
fu_1_arm_1
fu_2_arm_1

I want to create a subset that only includes those who have completed all 3 timepoints. In the image below, the orange rows would be included and the yellow rows would be excluded.

Screen Shot 2022-05-26 at 9.29.45 AM.png

I've tried various version of subsetting using WHERE, IN, OR, AND, etc.

Seems like it should be pretty straight forward, but I can't figure it out. 

 

Thanks.

4 REPLIES 4
Tom
Super User Tom
Super User

It is complicated because you essentially need to process the data for each participant twice.  Once to check if all three events occurred and a second time to decide whether to output them.

 

You could do that in a data step using two DO loops.

data want;
  do until(last.participant);
    set have;
    by participant;
    if redcap_event_name='baseline_arm_1' then baseline=1;
    if redcap_event_name='fu_arm_1' then fu1=1;
    if redcap_event_name='fu_arm_2' then fu2=1;
  end;
  do until(last.participant);
    set have;
    by participant;
    if baseline and fu1 and fu2 then output;
  end;
run;
PaigeMiller
Diamond | Level 26
/* UNTESTED CODE */
proc freq data=have;
    tables record_id/out=counts;
run;
data want;
    merge have counts;
    by record_id;
    if count=3;
run;

If you want tested code, please supply the data (not as a screen capture, we can't test code against a screen capture) but as SAS data step code, which you can type yourself or use these instructions. And you should provide data this way in your future questions.

--
Paige Miller
Reeza
Super User

proc sql;

create table want as

select * from have

where record_id in (select record_id from have group by record_id having count (distinctr edcap_event_name) =3)

order by 1, 2;

quit;

mkeintz
PROC Star

Assuming that the desired complete sets always occur in the sequence

    baseline_arm_1
    fu_1_arm_1

    fu_1_arm_2

you can do a "lookahead" merge to determine whether the current record and the two subsequent records should be output (untested)::

 

data want (drop=_:);

  merge have
        have (firstobs=2 keep=redcap_event_name rename=(redcap_event_name=_nxt_evnt))
        have (firstobs=3 keep=participant redcap_event_name rename=(participant=_nxt_part2 redcap_event_name=_nxt_evnt2));
  
  retain _counter 0;
  if _counter<=0 then do;  /*If not already in a threesome, then check the lookaheads */
    if redcap_event_name='baseline_arm_1'  and _nxt_evnt='fu_1_arm_1'  and _nxt_evnt2='fu_2_arm_2'
       and _nxtpart2=participant then _counter=3;
  end;

  if _counter>0;
  _counter+(-1);
run;

 

Editted note:  slightly cleaner code below:

 

data want (drop=_:);

  merge have
        have (firstobs=2 keep=redcap_event_name rename=(redcap_event_name=_nxt_evnt))
        have (firstobs=3 keep=participant redcap_event_name rename=(participant=_nxt_part2 redcap_event_name=_nxt_evnt2));
  
  retain _counter 0;
  if redcap_event_name='baseline_arm_1' then do;  /*If this is baseline, check the lookaheads */
     if _nxt_evnt='fu_1_arm_1'  and _nxt_evnt2='fu_2_arm_2' and _nxtpart2=participant then _counter=3;
  end;

  if _counter>0;
  _counter+(-1);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1132 views
  • 4 likes
  • 5 in conversation