Solved: Re: How to merge two datasets based on exact values across multiple va...

asgee · Posted 02-26-2021 08:27 PM

Hi all,

I have this dataset here which is a subset from a much larger master dataset:

Dataset A:

ID	Var1	Var2	Var3	Var4	Var5
1-2345	10	64.5	9.8	2.1	5
6-7890	10	5.4	7.5	10	6.4
9-8755	1.1	4.2	6.4	10.5	5.8
4-3210	10	15.4	3.1	11	13

I have new variables I want to add onto Dataset A based on this Dataset B:

Dataset B:

Var1	Var2	Var3	Var4	Var5	Var6	Var7
10	64.5	9.8	2.1	5	7.7	5.5
10	5.4	7.5	10	6.4	9.4	6.8
1.1	4.2	6.4	10.5	5.8	8.2	10
10	15.4	3.1	11	13	1.2	51

The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together.

I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?).

proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;

proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?

Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A.

Any help would be appreciated, thanks.

mkeintz · Posted 02-26-2021 08:56 PM

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

mkeintz · Posted 02-26-2021 08:56 PM

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

asgee · Posted 02-26-2021 09:16 PM

Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).

Thanks, will try it out myself!

How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

Re: How to merge two datasets based on exact values across multiple variables?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away