BookmarkSubscribeRSS Feed
marleeakerson
Calcite | Level 5

Hello,

 

Using this sample data: 

 

ID         Code

1              A

1              B

1              C

2              A

2              B

3              A

3              B

3              C

 

I am trying to create a new table that pulls observations where an ID has both Code 'A' and Code 'C' associated with it. 

 

So the final data would look like: 

ID         Code

1              A

1              B

1              C

3              A

3              B

3              C

 

Thank you!

5 REPLIES 5
Kurt_Bremser
Super User

Run a double DO loop, since it is sorted:

data want;
do until (last.id);
  set have;
  by id;
  if code = 'A' then _a = 1;
  if code = 'C' then _c = 1;
end;
do until (last.id);
  set have;
  by id;
  if _a and _c then output;
end;
drop _:;
run;
novinosrin
Tourmaline | Level 20

Hi @marleeakerson  If you wanna try the PROC SQL aka "Readymeals" solution, it's worth having fun. 

data have;
 input ID         Code $;
 cards;
1              A

1              B

1              C

2              A

2              B
2			   A

3              A

3              B

3              C
;



proc sql;
 create table want as
 select *
 from have
 group by id
 having count(distinct ifc(code in ('A','C'),code,' '))=2
 order by id,code;
quit;

Sir @ChrisHemedinger  How on earth you folks could be so genius and creative to think of IFC/IFN making it too user friendly(aka readymeals) to a fault. Kudos!!!*1E6. Just an incredible thought to envision it in the first place. If the designer would accept an ordinary bloke's appreciation, please let him/her know the privileges we enjoy.

FreelanceReinh
Jade | Level 19

Hi @novinosrin: I think the condition

having max(code='A') & max(code='C')

(or with SUM instead of MAX) would improve performance. (At least it did on my computer with 5.5 million obs. in HAVE of which 48% were selected: It was more than three times faster, but still much slower [about factor 2] than the double DOW loop.)

novinosrin
Tourmaline | Level 20

Yes I agree. Performance being the criteria, DOW  is indeed the best. It's just that I find some of the Proc SQL functionality very trendy and user-friendly while being seated among a group of general business users. They just enjoy fastfood and done. 🙂 Even beginners seem to grasp so quickly. I guess that accomplishes much of our credit risk queries at ease. 

ChrisHemedinger
Community Manager

Love to see the different creative solutions here.  

 

IFC (and IFN) is a SAS function, not native to SQL.  Sometimes we need a SAS function to do the job -- there are so many available and SQL doesn't have analogous functions for all of them.  But you might pay a performance penalty for using a SAS function in PROC SQL especially when working with database data.  SAS tries to push as much work to the database as possible, but a unique SAS function will force SAS to bring at least some of the data over and increase the I/O cost.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 685 views
  • 3 likes
  • 5 in conversation