BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jjb123
Obsidian | Level 7

I want to merge (using left join) using the id variable in each set and based on var1, var2, var3...etc. (var1-36).  However, I don't want to require that they all match.  For instance, I want the information to merge whether the only match is a.var2 and b.var7 or if ten match or if they all match.  Furthermore, there are missing values for some variables for almost all observations, so I want to ensure that missing matches are not captured.  Any help is appreciated.  

 

 

proc sql;
create table want as
select 		a.*, b.info
from 		have1 as a left join have2 as b
on			a.id = b.id and
 (a.var1-36 = b.var1-36)
order by           id, date;
quit; 
1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@jjb123 

If using SQL with SAS tables then here one way to go.

data have;
  input unique id var1 var2 var3 var4 info;
datalines;
1 1 . . . 1234 54
2 1 32423 3713 1234 328931 26
3 1 3713 1234 123412 3253 82
4 2 4567 . . 12 93
5 2 . . . 1267 102
6 2 12 145 86 92 96
7 3 . 8214 1479 . 123
8 3 . . . . 85
9 3 987 345 7528 93842 146
;
run;

proc sql;
  create table want as
    select  a.*, b.info as merged_info
      from  have as a left join have as b
        on  a.unique ne b.unique and a.id = b.id and
            (
              whichn(a.var1,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var2,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var3,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var4,.,b.var1,b.var2,b.var3,b.var4) >1
            )
      order by unique, id
      ;
quit;

 

View solution in original post

11 REPLIES 11
Astounding
PROC Star

Are you saying you want a match of the IDs match and a.var1 is equal to any of the fields b.var1 through b.var36 ?

jjb123
Obsidian | Level 7

Sort of.  I want a match if the IDs match and any of a.var1, a.var2, a.var3 (i.e., a.var1-36) is equal to any of b.var1, b.var2, b.var3 (i.e., b.var1-36).  

ballardw
Super User

Example input data from both sets and the result needed.

 

jjb123
Obsidian | Level 7

Assume the first data set is the have dataset (it can be have1 and have2 for simplicity).  Assume the second is the desired result.  The variables I used are a little different, so I'll update my original code as well.

proc sql;
create table want as
select 		a.*, b.info as merged_info
from 		have as a left join have as b
on			a.unique ne b.unique and a.id = b.id and
 (a.var1-4 = b.var1-4)
order by           unique, id;
quit; 

image.pngimage.png

mkeintz
PROC Star

@jjb123 wrote

.... Furthermore, there are missing values for some variables for almost all observations, so I want to ensure that missing matches are not captured.  Any help is appreciated.  

 

 

So what do you want to do if the variable is missing in one  data set and not missing in the other? Does that constitute a non-match?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jjb123
Obsidian | Level 7

I'm not sure I understand your question.  A missing value (which would only be a match if it matches to another missing value) should always be considered a non-match.  

andreas_lds
Jade | Level 19
If you want (tested) code provide data in usable form.
jjb123
Obsidian | Level 7

Here you go.

data have;
input unique id var1 var2 var3 var4 info;
datalines;
1 1 . . . 1234 54
2 1 32423 3713 1234 328931 26
3 1 3713 1234 123412 3253 82
4 2 4567 . . 12 93
5 2 . . . 1267 102
6 2 12 145 86 92 96
7 3 . 8214 1479 . 123
8 3 . . . . 85
9 3 987 345 7528 93842 146
;
andreas_lds
Jade | Level 19

Thanks, but again incomplete. In your sql code three datasets are mentioned: a, b and want. Which of the three is "have"?

Patrick
Opal | Level 21

@jjb123 

If using SQL with SAS tables then here one way to go.

data have;
  input unique id var1 var2 var3 var4 info;
datalines;
1 1 . . . 1234 54
2 1 32423 3713 1234 328931 26
3 1 3713 1234 123412 3253 82
4 2 4567 . . 12 93
5 2 . . . 1267 102
6 2 12 145 86 92 96
7 3 . 8214 1479 . 123
8 3 . . . . 85
9 3 987 345 7528 93842 146
;
run;

proc sql;
  create table want as
    select  a.*, b.info as merged_info
      from  have as a left join have as b
        on  a.unique ne b.unique and a.id = b.id and
            (
              whichn(a.var1,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var2,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var3,.,b.var1,b.var2,b.var3,b.var4) >1 or
              whichn(a.var4,.,b.var1,b.var2,b.var3,b.var4) >1
            )
      order by unique, id
      ;
quit;

 

jjb123
Obsidian | Level 7

This works like a charm. Thank you.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2757 views
  • 1 like
  • 6 in conversation