BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi friends,

I am in the process of mergeing 2 datasets as follows:

Dataset A:
A1-------A2-------A3
211-----211------ABC
211-----836------ABC
211-----837------ABC
212-----212------ABC
212-----836------ABC
212-----837------ABC

Dataset B:
A2------A3
836-----CDE
837-----FGH

My intended output datasest C after merge should look like:
Dataset C:
A1-------A2-------A3
211-----211------ABC
211-----836------CDE
211-----837------FGH
212-----212------ABC
212-----836------CDE
212-----837------FGH

The code I am using is quite simple and is as follows:


Proc Sort Data=A;By A2;Run;
Proc Sort Data=B;By A2;Run;
Data C;Merge A(In=A) B(In=B);By A2;If A;Run;


The output that I am getting is like this:
Dataset C:
A1-------A2-------A3
211-----211------ABC
211-----836------CDE
211-----837------FGH
212-----212------ABC
212-----836------ABC
212-----837------ABC

I was counting on the fact that in case of a merge, the data on the first(left) dataset is supposed to be overlayed by the data on the second(right) dataset.
That is what is happenning in the first set(A1=211), but for the duplicates(A1=212), that is not happenning.

In the example that I have put, I have only shown 1 variable A3, but in actuality there are 21 variables, so renaming and substituting conditionally (i.e. If B Then A.A3 = B.B3;) would be tedious to code. I was looking for some way of doing the same directly through the merge statement.

Please post your views/suggestions.

Thanks in advance.
4 REPLIES 4
1162
Calcite | Level 5
I would have agreed with you about which table is used by default, especially since it seemed to work for the first three. Maybe you have to be more explicit in telling SAS what to do. Here's a possible solution:

Proc Sort Data=A;By A2;Run;
Proc Sort Data=B;By A2;Run;
Data C;Merge A(In=A) B(In=B rename=(A3=A4));By A2;If A;If B then A3=A4;Drop A4;Run;
1162
Calcite | Level 5
Ah, now I see the problem. The merge did what you expected for the first three rows. The reason it didn't merge for the second instance of 836 (row 5 of table A) is because there is only one record in B for A2=836 and that was used when merging with row 2 of table A. DATA step merges don't merge one record in B with all matching records in A. If there is one record in B and 5 matching records in A then only the first record will be 'merged' the other four records will only have table A data.

I like to use PROC SQL in these cases (and because you don't have to sort first). It merges differently than the DATA step. Here's an example. In the coalesce command, you have to list the column from Table B first because you want the result from Table B to take priority.
[pre]
proc sql;
create table C as
select A.A1, A.A2, coalesce(B.A3, A.A3) as A3
from A left join B
on A.A2 = B.A2
order by A.A1, A.A2;
quit;
[/pre]

If you want to stick with the DATA step, try renaming the A3 variable in Table B to a new name. Then, when you merge, the B values are "retained" in the new variable. You can use an IF statement to assign these values back to A3. Here's an example:

[pre]
data C;
merge A (in=A) B (in=B rename=(A3=A4));
by A2;
if A;
if B then A3=A4;
drop A4;
run;
[/pre]
deleted_user
Not applicable
Thanks a lot for the help. I think I will use the Proc SQL route.

If I have same number of records in dataset B for say A2=836 as in dataset A (i.e. 2) then will the merge as in my first post work?
1162
Calcite | Level 5
Well, I'd have to run some tests to be sure, but I think it will work only if you have the same number of each level in B. In other words, you'd have to have the same numbers of '211's, '836's, etc. for column A2. Another possibility would be to merge by both A1 and A2, but by this point you almost have your final dataset anyway. I wouldn't take this approach unless the original dataset started out in this form.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1475 views
  • 0 likes
  • 2 in conversation