BookmarkSubscribeRSS Feed
AJ_Brien
Quartz | Level 8

Hello,

 

I'm trying to get a certain output through a join.

 

Code for creating Input datasets:

data rank;
infile datalines; 
input rmg $4. acc;
datalines; 
123	1
124	2
125	3
126	4
127	5
; 
run;

data rank1;
infile datalines; 
input rmg $4. acc;
datalines; 
123	1
124	2
125	3
123	7
124	8
125	9
136	10
138	11
; 
run;

What I'm looking to do is get all the data from table rank and get only that data from table rank1 where rank.rmg=rank1.rmg but rank.acc is not in rank1.acc to avoid duplication. So for eg, for rmg 123 in rank, the code should include 123 from rank, but in addition should also pull 123  from rank1 where acc is not 1.

 

So this is the output that I'm looking to get:

rmgacc
1231
1237
1242
1248
1253
1259
1264
1275

 

When I try this code:

proc sql;
create table new as
select rank.*, rank1.rmg, rank1.acc from rank left join rank1
on rank.rmg=rank1.rmg
where rank1.acc not in (select distinct acc from rank);
quit;

I get this output:

rmgacc
1231
1231
1242
1242
1253
1253
1264
1275

 

Then when I try this code:

proc sql;
create table new as
select rank.*, rank1.rmg as rmg1, rank1.acc as acc1 from rank left join rank1
on rank.rmg=rank1.rmg
where rank1.acc not in (select distinct acc from rank);
quit;

I get this output:

rmgaccrmg1acc1
12311237
12421248
12531259
1264  
1275  

 

What should I be changing to get the desired output? Appreciate any suggestions.

Thank you!

13 REPLIES 13
hashman
Ammonite | Level 13

@AJ_Brien:

Methinks that judging from your input/output data logic, what you're looking for is:

data rank ;                                                                                                                                                                                                                                                     
  input rmg :$3. acc ;                                                                                                                                                                                                                                          
  cards ;                                                                                                                                                                                                                                                       
123  1                                                                                                                                                                                                                                                          
124  2                                                                                                                                                                                                                                                          
125  3                                                                                                                                                                                                                                                          
126  4                                                                                                                                                                                                                                                          
127  5                                                                                                                                                                                                                                                          
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
data rank1;                                                                                                                                                                                                                                                     
  input rmg :$3. acc ;                                                                                                                                                                                                                                          
  cards ;                                                                                                                                                                                                                                                       
123   1                                                                                                                                                                                                                                                         
124   2                                                                                                                                                                                                                                                         
125   3                                                                                                                                                                                                                                                         
123   7                                                                                                                                                                                                                                                         
124   8                                                                                                                                                                                                                                                         
125   9                                                                                                                                                                                                                                                         
136  10                                                                                                                                                                                                                                                         
138  11                                                                                                                                                                                                                                                         
;                                                                                                                                                                                                                                                               
run ;                                                                                                                                                                                                                                                           
                                                                                                                                                                                                                                                                
proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select * from rank                                                                                                                                                                                                                                            
  union                                                                                                                                                                                                                                                         
  select * from rank1 where rmg in (select rmg from rank)                                                                                                                                                                                                       
  ;                                                                                                                                                                                                                                                             
quit ;                                                        

UNION (without ALL) unduplicates the rows with the same RMG and ACC.

 

Kind regards

Paul D. 

AJ_Brien
Quartz | Level 8
Thank you for your reply.

The only drawback I see with this approach is that is the moment I add additional columns in the 1st dataset (which I probably do need to do), it gives the error: Corresponding select-list expressions are not compatible, that's the reason I was trying to go with left join.
SASKiwi
PROC Star

Try OUTER UNION CORRESPONDING instead of UNION in @hashman 's example then.

AJ_Brien
Quartz | Level 8

thank you for your reply, but outer union corresponding would create additional columns for me.

So to recap, I made slight changes to my input dataset to accomodate more columns, and change their names too:

data rank;/*base*/
infile datalines;
input check $2. rmgid $4. accno blah $2.;
datalines;
y 123 1 y
n 124 2 n
y 125 3 y
n 126 4 n
y 127 5 y
;
run;

data rank1;
infile datalines;
input rmg $4. acc;
datalines;
123 1
124 2
125 3
123 7
124 8
125 9
136 10
138 11
;
run;

and the output I'm trying for is:

check

rmgid

accno

blah

Y

123

1

Y

N

124

2

N

Y

125

3

Y

N

126

4

N

Y

127

5

Y

 

123

7

 

 

124

8

 

 

125

9

 



Basically trying to union them such that I get all the data from table rank and get only that data from table rank1 where rank.rmgid=rank1.rmg but rank.accno is not in rank1.acc to avoid duplication.

SASKiwi
PROC Star

Does SELECT DISTINCT instead of just SELECT give you the result you want?

AJ_Brien
Quartz | Level 8
I can't use distinct in the select statements because the rmgs are associated with different acc and I need to include them all.
The issue here is that I need the result to look kind of like it does for union (rows added below, no additional columns created), but the concern is that the 1st table contains differennt columns than the 2nd one, so simple union isn't working.
SASKiwi
PROC Star

Cheating.....

proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select * from rank
  union  
  select  ' ' as check
         ,rmgid
         ,accno
         ,' ' as blah
  from rank1
  where rmgid in (select rmgid from rank) 
  and accno not in (select accno from rank) 
  order by accno 
  ;                                                                                                                                                                                                                                                             
quit ;
AJ_Brien
Quartz | Level 8
Strangely, that's giving me this result:
check rmgid accno blah
Y 123 1 Y
N 124 2 N
Y 125 3 Y
N 126 4 N
Y 127 5 Y

Feels like the where conditions are not working for some reason.
But I have another question, and you'll hate me for saying this, but is this scalable? Meaning in this scenario I luckily happen to have just 4 columns in rank, but what if I have 100 columns or so. Trying to understand the best way to perform this logic and become better. Thank you for your help so far!
SASKiwi
PROC Star

Works fine for me with this complete program:

data rank;
infile datalines;
input check $2. rmgid $4. accno blah $2.;
datalines;
y 123 1 y
n 124 2 n
y 125 3 y
n 126 4 n
y 127 5 y
;
run;                                                                                                                                                                                                                                                      
                                                                                                                                                                                                                                                                
data rank1;
infile datalines;
input rmgid $4. accno;
datalines;
123 1
124 2
125 3
123 7
124 8
125 9
136 10
138 11
;
run;

proc sql ;                                                                                                                                                                                                                                                      
  create table want as                                                                                                                                                                                                                                          
  select * from rank
  union  
  select  ' ' as check
         ,rmgid
         ,accno
         ,' ' as blah
  from rank1
  where rmgid in (select rmgid from rank) 
  and accno not in (select accno from rank) 
  order by accno 
  ;                                                                                                                                                                                                                                                             
quit ;

If you want to scale this solution then I'd probably start looking at using DATA steps as they are a whole lot more flexible for stacking tables.

hashman
Ammonite | Level 13

@SASKiwi:

>If you want to scale this solution then I'd probably start looking at using DATA steps as they are a whole lot more flexible for stacking tables.<

 

You snatched it off the tip of my proverbial online tongue ;). 

And thank you for joining (no pun intended) the thread with your astute responses. 

 

Kind regards

Paul D.

 

SASKiwi
PROC Star

@hashman  - Thanks for providing the original UNION solution - I just started playing with it...

AJ_Brien
Quartz | Level 8
thank you so much for all your help @SASKiwi @hashman! I get to learn so much from this community 🙂
AJ_Brien
Quartz | Level 8
I tried another way: this might get me closer to where I want to be, sadly one of the queries for creating table temp1 gives me a blank output for some reason!?! Appreciate any help I can get with this, I've tried a number of ways with this query, but no result so far! This seemed easy initially, but feels tricky now

proc sql ;
create table temp as
select rmgid, accno from rank;

create table temp1 as /*this gives me a blank output, not sure why*/
select rmg as rmgid,acc as accno from rank1
where rmg in (select rmgid from rank)
and acc not in (select distinct accno from rank);

insert into temp
select rmgid, accno from temp1;
quit;

proc sql;
create table want as
select *, temp.rmgid,temp.accno
from rank full join temp
on rank.accno=temp.accno;
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1215 views
  • 2 likes
  • 3 in conversation