Hi, i would like to merge two tables by two variables A1 and A2, but keep the observations that only have A1 in common in the defenitive table. It would look like this :
Table 1 | ||
A1 | A2 | Y1 |
A | 1 | X |
A | 2 | X |
A | 3 | X |
A | 4 | X |
B | 1 | X |
C | 1 | X |
C | 2 | X |
D | 1 | X |
D | 2 | X |
Table 2 | ||
A1 | A2 | Y2 |
A | 1 | Y |
A | 5 | Y |
B | 2 | Y |
C | 1 | Y |
C | 2 | Y |
D | 1 | Y |
D | 3 | Y |
E | 3 | Y |
E | 4 | Y |
F | 2 | Y |
F | 1 | Y |
G | 2 | Y |
G | 3 | Y |
The definitive table would looke like it :
Have | |||
A1 | A2 | Y1 | Y2 |
A | 1 | X | Y |
A | 2 | X | . |
A | 3 | X | . |
A | 4 | X | . |
A | 5 | . | Y |
B | 1 | X | |
B |
2 | . | Y |
C | 1 | X | Y |
C | 2 | X | Y |
D | 1 | X | |
D | 2 | X | |
D | 3 | . | Y |
Thank you for your help 😄
Why is Y2 missing in the first obs of the desired output?
My bad it should not be missing, i just modified it.
Ok. If your actual data is sorted, a simple Merge Statement is enough?
I know that your sample data is not (but almost though so I have a feeling your actual data may be)
data Table1;
input A1 $ A2 Y1 $;
datalines;
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X
;
data Table2;
input A1 $ A2 Y2 $;
datalines;
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 1 Y
F 2 Y
G 2 Y
G 3 Y
;
data want;
merge Table1 Table2;
by A1 A2;
run;
Isn't the `in=` operator missing?
data want;
merge Table1(in = t1) Table2;
by A1 A2;
if t1;
run;
Bart
First we need a lookup table that identifies the a1 groups we want to keep:
data l;
merge
table1 (in=t1 keep=a1)
table2 (in=t2 keep=a1)
;
by a1;
if t1 and t2;
run;
Next, merge the tables, and select those groups in the lookup:
data want;
merge
table1
table2
;
by a1 a2;
if _n_ = 1
then do;
declare hash l (dataset:"l");
l.definekey("a1");
l.definedone();
end;
if l.check() = 0;
run;
*sql;
proc sql;
create table want1 as
select * from
(select a.a1,
a.a2,
a.y1,
b.y2
from table1 a left join table2 b
on a.a1=b.a1 and a.a2=b.a2
union
select b.a1,
b.a2,
a.y1,
b.y2
from table1 a right join table2 b
on a.a1=b.a1 and a.a2=b.a2)
where a1 in (select distinct a1 from table1);
;
quit;
*hash;
data want2;
length a1 $8 a2 8 y1 y2 $8;
if _N_=1 then do;
if 0 then set table2;
dcl hash h(dataset:"table2",multidata:"y")
x(multidata:"y")
a(multidata:"y",ordered:"y")
u();
h.definekey("a1","a2");
h.definedata("a1","a2","y2");
h.definedone();
x.definekey("a1","a2");
x.definedata("a1","a2","y1");
x.definedone();
a.definekey("a1","a2");
a.definedata("a1","a2","y1","y2");
a.definedone();
u.definekey("a1");
u.definedone();
dcl hiter hi("h") ai("a");
end;
do until(last);
set table1 end=last;
x.add();
_iorc_=u.add();
_iorc_=h.find();
if _iorc_ ne 0 then call missing(y2);
a.replace();
end;
if last;
do while(hi.next()=0);
_iorc_=x.find();
if _iorc_ ne 0 then call missing(y1);
a.replace();
end;
if last;
do while(ai.next()=0);
if u.find()=0 then output;
end;
h.clear();
x.clear();
a.clear();
u.clear();
run;
Based on strictly on what is stated I infer that the edge case of two groups of rows with a common A1 but completely disjoint in A2 would be in the result set.
Proc SQL can use a group COUNT to determine when a join has rows from both left and right tables. COALESCE must be used ensure the join key is propagated into the result set.
Example:
Added A1='Z' to LEFT with no such A1 in RIGHT, showing, that according to 'rules' Z should not be in result.
data Table1; input A1 $ A2 Y1 $; datalines; A 1 X A 2 X A 3 X A 4 X B 1 X C 1 X C 2 X D 1 X D 2 X Z 1 X Z 2 X ; data Table2; input A1 $ A2 Y2 $; datalines; A 1 Y A 5 Y B 2 Y C 1 Y C 2 Y D 1 Y D 3 Y E 3 Y E 4 Y F 1 Y F 2 Y G 2 Y G 3 Y ; proc sql; create table want as select coalesce (left.a1, right.a1) as a1 , coalesce (left.a2, right.a2) as a2 , left.y1 , right.y2 from table1 as left full join table2 as right on left.a1 = right.a1 & left.a2 = right.a2 group by calculated a1 having count (left.a1) > 0 and count (right.a1) > 0 order by a1, a2 ;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.