Data Merging and Record Linking

Accepted Solution Solved
Reply
Contributor
Posts: 50
Accepted Solution

Data Merging and Record Linking

This is a repost since I didn't get the answer I was hoping for previously.

 

I am trying to merge 3 datasets together so that I only have unique records at the end.

 

I have provided the code and an example of how I would like the final table to look like in the end. I have highlighted the part I am having issues with. The first table is the "base" table, and what I want to do is to either merge the keys from the other 2 tables if there is a match, or add (or append) the ones that don't have a match.It is also important to keep the "left-most" name. So in the case below, I want to keep name 'ghi' since it is in the first table, and put the key for the 2nd table in the "want" table on that record since the names match, and I also want to have the key for the 3rd table since the keys matched between 'ghi' in the 2nd table and 'vwx' in the 3rd table. I also want to not keep the 'vwx' from the 3rd table since it found a match.

 

The additional layer I need is for the query to also not add 'pqr' to the final table since it matches 'have1' to 'have2' on 'key', and 'have2' to 'have3' on 'name', which is the part i can't get right.

 

If I didn't explain that well or if anyone has any additional questions, please feel free to reach out. Thank you and I appreciated the help in advance.

 

Thanks,

Tom

 

 

 

data have1;

input name1 $ 3. key1 4.;

datalines ;

abc 111

def 112

ghi 113

;

run;

 

data have2;

input name2 $ 3. key2 4.;

datalines ;

jkl 114

ghi 115

pqr 111

;

run;

 

data have3;

input name3 $ 3. key3 4.;

datalines ;

stu 117

vwx 115

pqr 118

;

run;

 

 

 

name key1 key2 key3
abc 111 111 118
def 112    
ghi 113 115 115
jkl   114  
stu     117

Accepted Solutions
Solution
‎05-19-2017 08:26 AM
PROC Star
Posts: 1,673

Re: Data Merging and Record Linking

LIke this?

 


 proc sql;
   select coalescec(NAME1,NAME2,NAME3) as NAME
         , KEY1
         , KEY2
         , KEY3
   from HAVE1
         full outer join 
        HAVE2
         on  NAME2=NAME1 or KEY2=KEY1
         full outer join 
        HAVE3
         on  NAME3=NAME1 or KEY3=KEY1
          or NAME3=NAME2 or KEY3=KEY2
  order by NAME;
 quit;

 

NAME KEY1 KEY2 KEY3
abc 111 111 118
def 112 . .
ghi 113 115 115
jkl . 114 .
stu . . 117

 

View solution in original post


All Replies
Solution
‎05-19-2017 08:26 AM
PROC Star
Posts: 1,673

Re: Data Merging and Record Linking

LIke this?

 


 proc sql;
   select coalescec(NAME1,NAME2,NAME3) as NAME
         , KEY1
         , KEY2
         , KEY3
   from HAVE1
         full outer join 
        HAVE2
         on  NAME2=NAME1 or KEY2=KEY1
         full outer join 
        HAVE3
         on  NAME3=NAME1 or KEY3=KEY1
          or NAME3=NAME2 or KEY3=KEY2
  order by NAME;
 quit;

 

NAME KEY1 KEY2 KEY3
abc 111 111 118
def 112 . .
ghi 113 115 115
jkl . 114 .
stu . . 117

 

Contributor
Posts: 50

Re: Data Merging and Record Linking

Thank you for the quick response, it was exactly what I needed.

 

If I wanted to change this problem up a little, would I be able to do a similar process using the example tables below? In essence, I believe it's doing a similar thing, but on multiple sets of keys instead of just one. My thoughts initially are that the null values might cause issues, or it could have issues since a key could match on multiple other keys...I appreciate any feedback in advance!

 

NEW TABLES:

data have1;

input name1 $ 3. key1 4. key2 4. key3 4.;

datalines ;

aaa 111 222 .

bbb 112 . 334

ccc 113 . .

;

run;

data have2;

input name2 $ 3. key1 4. key2 4. key3 4.;

datalines ;

ddd 111 222 333

eee . 223 334

fff . 224 335

;

run;

 

data have3;

input name3 $ 3. key1 4. key2 4. key3 4.;

datalines ;

ggg . 223 .

hhh 113 225 333

iii . . 336

;

run;

 

Want:

name key1 key2 key3
aaa 111 222 333
bbb 112 223 334
ccc 113 225 333
fff   224 335
iii     336
Contributor
Posts: 50

Re: Data Merging and Record Linking

I think I figured it out...but please anyone let me know if you see anything that could cause an issue..

 

 

proc sql;

create table want as

select coalescec(NAME1,NAME2,NAME3) as NAME

, coalesce(a.KEY1,b.key1,c.key1) as key1

, coalesce(a.KEY2,b.key2,c.key2) as key2

, coalesce(a.KEY3,b.key3,c.key3) as key3

from HAVE1 a

full outer join HAVE2 b on b.KEY1=a.KEY1 or b.KEY2=a.KEY2 or b.KEY3=a.KEY3

full outer join HAVE3 c on c.key1=coalesce(a.key1,b.key1,0) or c.key2=coalesce(a.key2,b.key2,0) or c.key3=coalesce(a.key3,b.key3,0)

order by NAME

;quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 146 views
  • 1 like
  • 2 in conversation