BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Manhort
Obsidian | Level 7

Hello all,

 

I have a large dataset. Individuals can appear on more than 1 row based on an ID. I need to subset the data to have individuals that are given both treatment A and B.

 

Example data:
ID treatment
1 A
1 B
1 A
2 A
2 A
3 B
3 B
3 B
3 B
4 A
4 B
5 A
5 A
6 B


Desired Result:

ID treatment
1 A
1 B
1 A
4 A
4 B

 

 

 

Any help is appreciated. 
Here's the input version.


DATA have;
INPUT id treatment $;
CARDS;
1 A
1 B
1 A
2 A
2 A
3 B
3 B
3 B
3 B
4 A
4 B
5 A
5 A
6 B
;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

You could ask PROC SQL to do exactly what you just described.

proc sql;
create table want as
select a.* from have a
where exists (select id from have b where a.id=b.id and b.treatment='A')
  and exists (select id from have c where a.id=c.id and c.treatment='B')
;
quit;

Or if there are just two treatments perhaps it is easier to use GROUP BY instead?

proc sql;
create table want2 as
select * from have
group by id
having count(distinct treatment)=2
;
quit;

But if your dataset is really large (and already sorted by ID) then a data step will be faster.

data want3;
  merge have(in=in1)
        have(in=in2 keep=id treatment rename=(treatment=trtA) where=(trtA='A'))
        have(in=in3 keep=id treatment rename=(treatment=trtB) where=(trtB='B'))
 ;
 by id;
 if first.id then selected = (in1 and in2 and in3);
 retain selected ;
 if in1 and selected then output;
 drop trtA trtB selected;
run;

Or perhaps a data step with a double DOW loop?

data want4;
  do until (last.id);
    set have;
    by id;
    A=sum(A,treatment='A');    
    B=sum(B,treatment='B');
  end;
  do until (last.id);
    set have;
    by id;
    if A and B then output;
  end;
  drop A B ;
run;

View solution in original post

3 REPLIES 3
Tom
Super User Tom
Super User

You could ask PROC SQL to do exactly what you just described.

proc sql;
create table want as
select a.* from have a
where exists (select id from have b where a.id=b.id and b.treatment='A')
  and exists (select id from have c where a.id=c.id and c.treatment='B')
;
quit;

Or if there are just two treatments perhaps it is easier to use GROUP BY instead?

proc sql;
create table want2 as
select * from have
group by id
having count(distinct treatment)=2
;
quit;

But if your dataset is really large (and already sorted by ID) then a data step will be faster.

data want3;
  merge have(in=in1)
        have(in=in2 keep=id treatment rename=(treatment=trtA) where=(trtA='A'))
        have(in=in3 keep=id treatment rename=(treatment=trtB) where=(trtB='B'))
 ;
 by id;
 if first.id then selected = (in1 and in2 and in3);
 retain selected ;
 if in1 and selected then output;
 drop trtA trtB selected;
run;

Or perhaps a data step with a double DOW loop?

data want4;
  do until (last.id);
    set have;
    by id;
    A=sum(A,treatment='A');    
    B=sum(B,treatment='B');
  end;
  do until (last.id);
    set have;
    by id;
    if A and B then output;
  end;
  drop A B ;
run;
GeorgeBonanza
Obsidian | Level 7
Very interesting approaches and all four are different than the way I would've done it. You learn something new every day. Thanks for sharing.
Manhort
Obsidian | Level 7
Thanks @Tom. I found the second method you gave to be very useful.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 585 views
  • 2 likes
  • 3 in conversation