BookmarkSubscribeRSS Feed
Mathis1
Quartz | Level 8

Hi, i would like to merge two tables by two variables A1 and A2, but keep the observations that only have A1 in common in the defenitive table. It would look like this : 

 

Table 1     
A1 A2 Y1
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X

 

Table 2    
A1 A2 Y2
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 2 Y
F 1 Y
G 2 Y
G 3 Y

 

 

The definitive table would looke like it :


Have       
A1 A2 Y1 Y2
A 1 X Y
A 2 X .
A 3 X .
A 4 X .
A 5 . Y
B 1 X  

 

B

2 . Y
C 1 X Y
C 2 X Y
D 1 X  
D 2 X  
D 3 . Y

 

Thank you for your help 😄

 

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Why is Y2 missing in the first obs of the desired output?

Mathis1
Quartz | Level 8

My bad it should not be missing, i just modified it.

PeterClemmensen
Tourmaline | Level 20

Ok. If your actual data is sorted, a simple Merge Statement is enough?

 

I know that your sample data is not (but almost though so I have a feeling your actual data may be)

 

data Table1;   
input A1 $ A2 Y1 $;
datalines;
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X
;

data Table2;   
input A1 $ A2 Y2 $;
datalines;
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 1 Y
F 2 Y
G 2 Y
G 3 Y
;

data want;
   merge Table1 Table2;
   by A1 A2;
run;
yabwon
Onyx | Level 15

Isn't the `in=` operator missing?

data want;
   merge Table1(in = t1) Table2;
   by A1 A2;
   if t1;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Kurt_Bremser
Super User

First we need a lookup table that identifies the a1 groups we want to keep:

data l;
merge
  table1 (in=t1 keep=a1)
  table2 (in=t2 keep=a1)
;
by a1;
if t1 and t2;
run;

Next, merge the tables, and select those groups in the lookup:

data want;
merge
  table1
  table2
;
by a1 a2;
if _n_ = 1
then do;
  declare hash l (dataset:"l");
  l.definekey("a1");
  l.definedone();
end;
if l.check() = 0;
run;
hhinohar
Quartz | Level 8
*sql;
proc sql;
create table want1 as
select * from
  (select a.a1,
          a.a2,
          a.y1,
          b.y2 
            from table1 a left join table2 b 
              on a.a1=b.a1 and a.a2=b.a2
  union 
  select b.a1,
         b.a2,
         a.y1,
         b.y2
           from table1 a right join table2 b 
             on a.a1=b.a1 and a.a2=b.a2)
    where a1 in (select distinct a1 from table1);
  ;
quit;

*hash;
data want2;
  length a1 $8 a2 8 y1 y2 $8;
  if _N_=1 then do;
      if 0 then set table2;
      dcl hash h(dataset:"table2",multidata:"y") 
                x(multidata:"y") 
                a(multidata:"y",ordered:"y") 
                u();
      h.definekey("a1","a2");
      h.definedata("a1","a2","y2");
      h.definedone();
      
      x.definekey("a1","a2");
      x.definedata("a1","a2","y1");
      x.definedone();
      
      a.definekey("a1","a2");
      a.definedata("a1","a2","y1","y2");
      a.definedone();
      
      u.definekey("a1");
      u.definedone();

      dcl hiter hi("h") ai("a");		
  end;
  do until(last);
      set table1 end=last;
      x.add();
      _iorc_=u.add();
      _iorc_=h.find();
      if _iorc_ ne 0 then call missing(y2);
      a.replace();
  end;
  if last;
  do while(hi.next()=0);
      _iorc_=x.find();
      if _iorc_ ne 0 then call missing(y1);
      a.replace();
  end;
  if last;
  do while(ai.next()=0);
      if u.find()=0 then output;
  end;
  
  h.clear();
  x.clear();
  a.clear();
  u.clear();
run;
RichardDeVen
Barite | Level 11

Based on strictly on what is stated I infer that the edge case of two groups of rows with a common A1 but completely disjoint in A2 would be in the result set.

 

Proc SQL can use a group COUNT to determine when a join has rows from both left and right tables.  COALESCE must be used ensure the join key is propagated into the result set.

 

Example:

Added A1='Z' to LEFT with no such A1 in RIGHT, showing, that according to 'rules' Z should not be in result.

 

data Table1;   
input A1 $ A2 Y1 $;
datalines;
A 1 X
A 2 X
A 3 X
A 4 X
B 1 X
C 1 X
C 2 X
D 1 X
D 2 X
Z 1 X
Z 2 X
;

data Table2;   
input A1 $ A2 Y2 $;
datalines;
A 1 Y
A 5 Y
B 2 Y
C 1 Y
C 2 Y
D 1 Y
D 3 Y
E 3 Y
E 4 Y
F 1 Y
F 2 Y
G 2 Y
G 3 Y
;

proc sql;
  create table want as
  select 
    coalesce (left.a1, right.a1) as a1
  , coalesce (left.a2, right.a2) as a2
  , left.y1
  , right.y2
  from 
    table1 as left
  full join 
    table2 as right
  on
    left.a1 = right.a1 &
    left.a2 = right.a2
  group by
    calculated a1
  having 
    count (left.a1) > 0 and 
    count (right.a1) > 0
  order by
    a1, a2
  ;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1203 views
  • 0 likes
  • 6 in conversation