Hi all,
I have this dataset here which is a subset from a much larger master dataset:
Dataset A:
ID |
Var1 |
Var2 |
Var3 |
Var4 |
Var5 |
1-2345 |
10 |
64.5 |
9.8 |
2.1 |
5 |
6-7890 |
10 |
5.4 |
7.5 |
10 |
6.4 |
9-8755 |
1.1 |
4.2 |
6.4 |
10.5 |
5.8 |
4-3210 |
10 |
15.4 |
3.1 |
11 |
13 |
I have new variables I want to add onto Dataset A based on this Dataset B:
Dataset B:
Var1 |
Var2 |
Var3 |
Var4 |
Var5 |
Var6 |
Var7 |
10 |
64.5 |
9.8 |
2.1 |
5 |
7.7 |
5.5 |
10 |
5.4 |
7.5 |
10 |
6.4 |
9.4 |
6.8 |
1.1 |
4.2 |
6.4 |
10.5 |
5.8 |
8.2 |
10 |
10 |
15.4 |
3.1 |
11 |
13 |
1.2 |
51 |
The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together.
I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?).
proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;
proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?
Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A.
Any help would be appreciated, thanks.
Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
to
data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;
Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.
Let's say every record in A has a unique combination of var1/var2/var3/var4/var5. And the same for B. If so, it won't matter which is the major sort key. But you would have to change
data new;
merge dataA (in=A) dataB (in=B);
if A;
run;
to
data new;
merge dataA (in=A) dataB (in=B);
by var1 var2 var3 var4 var5;
if A;
run;
Of course this could be done in sql as a left join of a with b on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.
Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).
Thanks, will try it out myself!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.