BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
asgee
Obsidian | Level 7

Hi all,

 

I have this dataset here which is a subset from a much larger master dataset:

Dataset A:

ID

Var1

Var2

Var3

Var4

Var5

1-2345

10

64.5

9.8

2.1

5

6-7890

10

5.4

7.5

10

6.4

9-8755

1.1

4.2

6.4

10.5

5.8

4-3210

10

15.4

3.1

11

13

 

I have new variables I want to add onto Dataset A based on this Dataset B: 

Dataset B:

Var1

Var2

Var3

Var4

Var5

Var6

Var7

10

64.5

9.8

2.1

5

7.7

5.5

10

5.4

7.5

10

6.4

9.4

6.8

1.1

4.2

6.4

10.5

5.8

8.2

10

10

15.4

3.1

11

13

1.2

51

 

The problem as you can see is that I don't have an "ID" variable on Dataset B (it's complicated) and so I don't have an identifier variable I can use to match and merge the two datasets together. 

 

I tried the code below but my understanding is that this prioritizes the variables I'm listing in order. I'm thinking this would be an issue since Var1 contains 3 rows of "10" which could confuse the merging since it won't know which ID to merge it to (?). 

proc sort data = dataA;
by var1 var2 var3 var4 var5;
run;

proc sort data = dataB;
by var1 var2 var3 var4 var5;
run;

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

How would I merge it so that it is the combination of values across "var1" - "var5" that I am merging by?

 

Since Dataset A originally came from Dataset B, the combination of unique values across "Var1" - "Var5" in both datasets are essentially the same. The only thing new is the "Var6" & "Var7" variables that I'm trying to add from Dataset B to Dataset A. 

 

Any help would be appreciated, thanks.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5.  And the same for B.  If so, it won't matter which is the major sort key.  But you would have to change

 

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b   on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
mkeintz
PROC Star

Let's say every record in A has a unique combination of var1/var2/var3/var4/var5.  And the same for B.  If so, it won't matter which is the major sort key.  But you would have to change

 

data new;
merge dataA (in=A) dataB (in=B);
if A;
run;

to

data new;
  merge dataA (in=A) dataB (in=B);
  by var1 var2 var3 var4 var5;
  if A;
run;

Of course this could be done in sql as a left join of a with b   on a.var1=b.var1 and a.var2=b.var2 ... and a.var5=b.var5.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
asgee
Obsidian | Level 7

Hi @mkeintz ! Thanks for the reply. I see what you mean. I think I like your idea of doing the SQL commands instead since that's exactly what I was trying to go for (same values across each of the 5 vars).

Thanks, will try it out myself! 

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1563 views
  • 1 like
  • 2 in conversation