Solved: How to merge two datasets based on exact values across multiple variab...

asgee · Posted 02-26-2021 08:27 PM

Hi all,

I have this dataset here which is a subset from a much larger master dataset:

Dataset A:

ID	Var1	Var2	Var3	Var4	Var5
1-2345	10	64.5	9.8	2.1	5
6-7890	10	5.4	7.5	10	6.4
9-8755	1.1	4.2	6.4	10.5	5.8
4-3210	10	15.4	3.1	11	13

I have new variables I want to add onto Dataset A based on this Dataset B:

Dataset B:

Var1	Var2	Var3	Var4	Var5	Var6	Var7
10	64.5	9.8	2.1	5	7.7	5.5
10	5.4	7.5	10	6.4	9.4	6.8
1.1	4.2	6.4	10.5	5.8	8.2	10
10	15.4	3.1	11	13	1.2	51

The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together.

I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?).

proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;

proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?

Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A.

Any help would be appreciated, thanks.

mkeintz · Posted 02-26-2021 08:56 PM

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 02-26-2021 08:56 PM

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

asgee · Posted 02-26-2021 09:16 PM

Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).

Thanks, will try it out myself!

How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Catch up on SAS Innovate 2026

How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away