BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello,

 

Using this sample data: 

 

ID         Code

1              A

1              B

1              C

2              A

2              B

3              A

3              B

3              C

 

I am trying to create a new table that pulls observations where an ID has both Code 'A' and Code 'C' associated with it. 

 

So the final data would look like: 

ID         Code

1              A

1              B

1              C

3              A

3              B

3              C

 

Thank you!

5 REPLIES 5
Kurt_Bremser
Super User

Run a double DO loop, since it is sorted:

data want;
do until (last.id);
  set have;
  by id;
  if code = 'A' then _a = 1;
  if code = 'C' then _c = 1;
end;
do until (last.id);
  set have;
  by id;
  if _a and _c then output;
end;
drop _:;
run;
novinosrin
Tourmaline | Level 20

Hi @marleeakerson  If you wanna try the PROC SQL aka "Readymeals" solution, it's worth having fun. 

data have;
 input ID         Code $;
 cards;
1              A

1              B

1              C

2              A

2              B
2			   A

3              A

3              B

3              C
;



proc sql;
 create table want as
 select *
 from have
 group by id
 having count(distinct ifc(code in ('A','C'),code,' '))=2
 order by id,code;
quit;

Sir @ChrisHemedinger  How on earth you folks could be so genius and creative to think of IFC/IFN making it too user friendly(aka readymeals) to a fault. Kudos!!!*1E6. Just an incredible thought to envision it in the first place. If the designer would accept an ordinary bloke's appreciation, please let him/her know the privileges we enjoy.

FreelanceReinh
Jade | Level 19

Hi @novinosrin: I think the condition

having max(code='A') & max(code='C')

(or with SUM instead of MAX) would improve performance. (At least it did on my computer with 5.5 million obs. in HAVE of which 48% were selected: It was more than three times faster, but still much slower [about factor 2] than the double DOW loop.)

novinosrin
Tourmaline | Level 20

Yes I agree. Performance being the criteria, DOW  is indeed the best. It's just that I find some of the Proc SQL functionality very trendy and user-friendly while being seated among a group of general business users. They just enjoy fastfood and done. 🙂 Even beginners seem to grasp so quickly. I guess that accomplishes much of our credit risk queries at ease. 

ChrisHemedinger
Community Manager

Love to see the different creative solutions here.  

 

IFC (and IFN) is a SAS function, not native to SQL.  Sometimes we need a SAS function to do the job -- there are so many available and SQL doesn't have analogous functions for all of them.  But you might pay a performance penalty for using a SAS function in PROC SQL especially when working with database data.  SAS tries to push as much work to the database as possible, but a unique SAS function will force SAS to bring at least some of the data over and increase the I/O cost.

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!