BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Manhort
Obsidian | Level 7

Hello all,

 

I have a large dataset. Individuals can appear on more than 1 row based on an ID. I need to subset the data to have individuals that are given both treatment A and B.

 

Example data:
ID treatment
1 A
1 B
1 A
2 A
2 A
3 B
3 B
3 B
3 B
4 A
4 B
5 A
5 A
6 B


Desired Result:

ID treatment
1 A
1 B
1 A
4 A
4 B

 

 

 

Any help is appreciated. 
Here's the input version.


DATA have;
INPUT id treatment $;
CARDS;
1 A
1 B
1 A
2 A
2 A
3 B
3 B
3 B
3 B
4 A
4 B
5 A
5 A
6 B
;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You could ask PROC SQL to do exactly what you just described.

proc sql;
create table want as
select a.* from have a
where exists (select id from have b where a.id=b.id and b.treatment='A')
  and exists (select id from have c where a.id=c.id and c.treatment='B')
;
quit;

Or if there are just two treatments perhaps it is easier to use GROUP BY instead?

proc sql;
create table want2 as
select * from have
group by id
having count(distinct treatment)=2
;
quit;

But if your dataset is really large (and already sorted by ID) then a data step will be faster.

data want3;
  merge have(in=in1)
        have(in=in2 keep=id treatment rename=(treatment=trtA) where=(trtA='A'))
        have(in=in3 keep=id treatment rename=(treatment=trtB) where=(trtB='B'))
 ;
 by id;
 if first.id then selected = (in1 and in2 and in3);
 retain selected ;
 if in1 and selected then output;
 drop trtA trtB selected;
run;

Or perhaps a data step with a double DOW loop?

data want4;
  do until (last.id);
    set have;
    by id;
    A=sum(A,treatment='A');    
    B=sum(B,treatment='B');
  end;
  do until (last.id);
    set have;
    by id;
    if A and B then output;
  end;
  drop A B ;
run;

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You could ask PROC SQL to do exactly what you just described.

proc sql;
create table want as
select a.* from have a
where exists (select id from have b where a.id=b.id and b.treatment='A')
  and exists (select id from have c where a.id=c.id and c.treatment='B')
;
quit;

Or if there are just two treatments perhaps it is easier to use GROUP BY instead?

proc sql;
create table want2 as
select * from have
group by id
having count(distinct treatment)=2
;
quit;

But if your dataset is really large (and already sorted by ID) then a data step will be faster.

data want3;
  merge have(in=in1)
        have(in=in2 keep=id treatment rename=(treatment=trtA) where=(trtA='A'))
        have(in=in3 keep=id treatment rename=(treatment=trtB) where=(trtB='B'))
 ;
 by id;
 if first.id then selected = (in1 and in2 and in3);
 retain selected ;
 if in1 and selected then output;
 drop trtA trtB selected;
run;

Or perhaps a data step with a double DOW loop?

data want4;
  do until (last.id);
    set have;
    by id;
    A=sum(A,treatment='A');    
    B=sum(B,treatment='B');
  end;
  do until (last.id);
    set have;
    by id;
    if A and B then output;
  end;
  drop A B ;
run;
GeorgeBonanza
Obsidian | Level 7
Very interesting approaches and all four are different than the way I would've done it. You learn something new every day. Thanks for sharing.
Manhort
Obsidian | Level 7
Thanks @Tom. I found the second method you gave to be very useful.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 855 views
  • 2 likes
  • 3 in conversation