Merge by two variables (keeping the common observations of one only)

Mathis1 · Posted 12-07-2020 04:23 AM

Hi, i would like to merge two tables by two variables A1 and A2, but keep the observations that only have A1 in common in the defenitive table. It would look like this :

Table 1
A1	A2	Y1
A	1	X
A	2	X
A	3	X
A	4	X
B	1	X
C	1	X
C	2	X
D	1	X
D	2	X

Table 2
A1	A2	Y2
A	1	Y
A	5	Y
B	2	Y
C	1	Y
C	2	Y
D	1	Y
D	3	Y
E	3	Y
E	4	Y
F	2	Y
F	1	Y
G	2	Y
G	3	Y

The definitive table would looke like it :

Have
A1	A2	Y1	Y2
A	1	X	Y
A	2	X	.
A	3	X	.
A	4	X	.
A	5	.	Y
B	1	X
B	2	.	Y
C	1	X	Y
C	2	X	Y
D	1	X
D	2	X
D	3	.	Y

Thank you for your help 😄

PeterClemmensen · Posted 12-07-2020 05:18 AM

Why is Y2 missing in the first obs of the desired output?

The DATA to DATA Step Macro
Blog: SASnrd

Mathis1 · Posted 12-07-2020 05:22 AM

My bad it should not be missing, i just modified it.

PeterClemmensen · Posted 12-07-2020 06:47 AM

Ok. If your actual data is sorted, a simple Merge Statement is enough?

I know that your sample data is not (but almost though so I have a feeling your actual data may be)

data Table1;   
input A1 $ A2 Y1 $;
datalines;
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X
;

data Table2;   
input A1 $ A2 Y2 $;
datalines;
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 1 Y
F 2 Y
G 2 Y
G 3 Y
;

data want;
   merge Table1 Table2;
   by A1 A2;
run;

The DATA to DATA Step Macro
Blog: SASnrd

yabwon · Posted 12-07-2020 06:50 AM

Isn't the `in=` operator missing?

data want;
   merge Table1(in = t1) Table2;
   by A1 A2;
   if t1;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Kurt_Bremser · Posted 12-07-2020 07:22 AM

First we need a lookup table that identifies the a1 groups we want to keep:

data l;
merge
  table1 (in=t1 keep=a1)
  table2 (in=t2 keep=a1)
;
by a1;
if t1 and t2;
run;

Next, merge the tables, and select those groups in the lookup:

data want;
merge
  table1
  table2
;
by a1 a2;
if _n_ = 1
then do;
  declare hash l (dataset:"l");
  l.definekey("a1");
  l.definedone();
end;
if l.check() = 0;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

hhinohar · Posted 12-07-2020 07:27 AM

*sql;
proc sql;
create table want1 as
select * from
  (select a.a1,
          a.a2,
          a.y1,
          b.y2 
            from table1 a left join table2 b 
              on a.a1=b.a1 and a.a2=b.a2
  union 
  select b.a1,
         b.a2,
         a.y1,
         b.y2
           from table1 a right join table2 b 
             on a.a1=b.a1 and a.a2=b.a2)
    where a1 in (select distinct a1 from table1);
  ;
quit;

*hash;
data want2;
  length a1 $8 a2 8 y1 y2 $8;
  if _N_=1 then do;
      if 0 then set table2;
      dcl hash h(dataset:"table2",multidata:"y") 
                x(multidata:"y") 
                a(multidata:"y",ordered:"y") 
                u();
      h.definekey("a1","a2");
      h.definedata("a1","a2","y2");
      h.definedone();
      
      x.definekey("a1","a2");
      x.definedata("a1","a2","y1");
      x.definedone();
      
      a.definekey("a1","a2");
      a.definedata("a1","a2","y1","y2");
      a.definedone();
      
      u.definekey("a1");
      u.definedone();

      dcl hiter hi("h") ai("a");		
  end;
  do until(last);
      set table1 end=last;
      x.add();
      _iorc_=u.add();
      _iorc_=h.find();
      if _iorc_ ne 0 then call missing(y2);
      a.replace();
  end;
  if last;
  do while(hi.next()=0);
      _iorc_=x.find();
      if _iorc_ ne 0 then call missing(y1);
      a.replace();
  end;
  if last;
  do while(ai.next()=0);
      if u.find()=0 then output;
  end;
  
  h.clear();
  x.clear();
  a.clear();
  u.clear();
run;

RichardDeVen · Posted 12-07-2020 12:42 PM

Based on strictly on what is stated I infer that the edge case of two groups of rows with a common A1 but completely disjoint in A2 would be in the result set.

Proc SQL can use a group COUNT to determine when a join has rows from both left and right tables. COALESCE must be used ensure the join key is propagated into the result set.

Example:

Added A1='Z' to LEFT with no such A1 in RIGHT, showing, that according to 'rules' Z should not be in result.

data Table1;   
input A1 $ A2 Y1 $;
datalines;
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X
Z 1 X
Z 2 X
;

data Table2;   
input A1 $ A2 Y2 $;
datalines;
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 1 Y
F 2 Y
G 2 Y
G 3 Y
;

proc sql;
  create table want as
  select 
    coalesce (left.a1, right.a1) as a1
  , coalesce (left.a2, right.a2) as a2
  , left.y1
  , right.y2
  from 
    table1 as left
  full join 
    table2 as right
  on
    left.a1 = right.a1 &
    left.a2 = right.a2
  group by
    calculated a1
  having 
    count (left.a1) > 0 and 
    count (right.a1) > 0
  order by
    a1, a2
  ;

Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Re: Merge by two variables (keeping the common observations of one only)

Classroom Training Available!