Obsidian | Level 7

How to merge two datasets based on exact values across multiple variables?

Hi all,

I have this dataset here which is a subset from a much larger master dataset:

Dataset A:

 ID Var1 Var2 Var3 Var4 Var5 1-2345 10 64.5 9.8 2.1 5 6-7890 10 5.4 7.5 10 6.4 9-8755 1.1 4.2 6.4 10.5 5.8 4-3210 10 15.4 3.1 11 13

I have new variables I want to add onto Dataset A based on this Dataset B:

Dataset B:

 Var1 Var2 Var3 Var4 Var5 Var6 Var7 10 64.5 9.8 2.1 5 7.7 5.5 10 5.4 7.5 10 6.4 9.4 6.8 1.1 4.2 6.4 10.5 5.8 8.2 10 10 15.4 3.1 11 13 1.2 51

The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together.

I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?).

``````proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;

proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;``````

How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?

Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A.

Any help would be appreciated, thanks.

1 ACCEPTED SOLUTION

Accepted Solutions
PROC Star

Re: How to merge two datasets based on exact values across multiple variables?

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5.  And the same for B.  If so, it won't matter which is the major sort key.  But you would have to change

``````data new;
merge dataA (in=A) dataB (in=B);
if A;
run;``````

to

``````data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;``````

Of course this could be done in sql as a left join of a with b   on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
2 REPLIES 2
PROC Star

Re: How to merge two datasets based on exact values across multiple variables?

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5.  And the same for B.  If so, it won't matter which is the major sort key.  But you would have to change

``````data new;
merge dataA (in=A) dataB (in=B);
if A;
run;``````

to

``````data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;``````

Of course this could be done in sql as a left join of a with b   on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Obsidian | Level 7

Re: How to merge two datasets based on exact values across multiple variables?

Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).

Thanks, will try it out myself!

Discussion stats
• 2 replies
• 964 views
• 1 like
• 2 in conversation