My intended output datasest C after merge should look like: Dataset C:
The code I am using is quite simple and is as follows:
The output that I am getting is like this: Dataset C:
I was counting on the fact that in case of a merge, the data on the first(left) dataset is supposed to be overlayed by the data on the second(right) dataset.
That is what is happenning in the first set(A1=211), but for the duplicates(A1=212), that is not happenning.
In the example that I have put, I have only shown 1 variable A3, but in actuality there are 21 variables, so renaming and substituting conditionally (i.e. If B Then A.A3 = B.B3;) would be tedious to code. I was looking for some way of doing the same directly through the merge statement.
I would have agreed with you about which table is used by default, especially since it seemed to work for the first three. Maybe you have to be more explicit in telling SAS what to do. Here's a possible solution:
Proc Sort Data=A;By A2;Run;
Proc Sort Data=B;By A2;Run;
Data C;Merge A(In=A) B(In=B rename=(A3=A4));By A2;If A;If B then A3=A4;Drop A4;Run;
Ah, now I see the problem. The merge did what you expected for the first three rows. The reason it didn't merge for the second instance of 836 (row 5 of table A) is because there is only one record in B for A2=836 and that was used when merging with row 2 of table A. DATA step merges don't merge one record in B with all matching records in A. If there is one record in B and 5 matching records in A then only the first record will be 'merged' the other four records will only have table A data.
I like to use PROC SQL in these cases (and because you don't have to sort first). It merges differently than the DATA step. Here's an example. In the coalesce command, you have to list the column from Table B first because you want the result from Table B to take priority.
create table C as
select A.A1, A.A2, coalesce(B.A3, A.A3) as A3
from A left join B
on A.A2 = B.A2
order by A.A1, A.A2;
If you want to stick with the DATA step, try renaming the A3 variable in Table B to a new name. Then, when you merge, the B values are "retained" in the new variable. You can use an IF statement to assign these values back to A3. Here's an example:
merge A (in=A) B (in=B rename=(A3=A4));
if B then A3=A4;
Well, I'd have to run some tests to be sure, but I think it will work only if you have the same number of each level in B. In other words, you'd have to have the same numbers of '211's, '836's, etc. for column A2. Another possibility would be to merge by both A1 and A2, but by this point you almost have your final dataset anyway. I wouldn't take this approach unless the original dataset started out in this form.