Hi all,
I have this dataset here which is a subset from a much larger master dataset:
Dataset A:
ID |
Var1 |
Var2 |
Var3 |
Var4 |
Var5 |
1-2345 |
10 |
64.5 |
9.8 |
2.1 |
5 |
6-7890 |
10 |
5.4 |
7.5 |
10 |
6.4 |
9-8755 |
1.1 |
4.2 |
6.4 |
10.5 |
5.8 |
4-3210 |
10 |
15.4 |
3.1 |
11 |
13 |
I have new variables I want to add onto Dataset A based on this Dataset B:
Dataset B:
Var1 |
Var2 |
Var3 |
Var4 |
Var5 |
Var6 |
Var7 |
10 |
64.5 |
9.8 |
2.1 |
5 |
7.7 |
5.5 |
10 |
5.4 |
7.5 |
10 |
6.4 |
9.4 |
6.8 |
1.1 |
4.2 |
6.4 |
10.5 |
5.8 |
8.2 |
10 |
10 |
15.4 |
3.1 |
11 |
13 |
1.2 |
51 |
The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together.
I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?).
proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;
proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?
Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A.
Any help would be appreciated, thanks.
Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
to
data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;
Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.
Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
to
data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;
Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.
Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).
Thanks, will try it out myself!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.