BookmarkSubscribeRSS Feed
Username11
Calcite | Level 5

So, I'm familiar with merges in SAS, and haven't had issues before, but I noticed an issue today that has never been an issue before.
For some reason the actual merging of observations is working properly in more complex data sets, however it only lists the variable values from one of the data sets (e.g. it doesn't overwrite missing values).

For instance, I wrote up this simple program:

    data dataset1;
    input id var1 var2 var3 var4;
    cards;
    1 . . 2 2
    2 . . 2 2
    3 . . 2 2
    4 . . 2 2
    5 . . 2 2
    6 . . 2 2
    7 . . 2 2
    8 . . 2 2
    9 . 2 . 2
    10 1 . . .
    ;
    data dataset2;
    input id var1 var2 var3 var4;
    cards;
    1  2 2 . .
    2  2 2 . .
    3  2 2 . .
    4  2 2 . .
    5  2 2 . .
    6  2 2 . .
    7  2 2 . .
    8  2 2 . .
    10 . 1 . .
    ;
    
    
    data dataset3;
    merge dataset1 dataset2;
    by id;
    run;

This should yield the following:

    id var1 var2 var3 var4
    1   2    2    2    2
    2   2    2    2    2
    3   2    2    2    2
    4   2    2    2    2
    5   2    2    2    2
    6   2    2    2    2
    7   2    2    2    2
    8   2    2    2    2
    9   .    2    .    2
    10  1    1    .    .
   

but instead, I get this:
 

    id var1 var2 var3 var4
        1   2    2    .    .
        2   2    2    .    .
        3   2    2    .    .
        4   2    2    .    .
        5   2    2    .    .
        6   2    2    .    .
        7   2    2    .    .
        8   2    2    .    .
        9   .    2    .    2
        10  .    1    .    .

So, it's as if the merge is merging the observations and then just displaying the second data set's values.

I've tried to figure out the issue (I have a feeling it's something very basic I've just looked over), but I've no idea what's happening, since I've never come across the issue before.

Anyone know what's going wrong?

Thanks for any help.

4 REPLIES 4
Astounding
PROC Star

Sorry to say, but MERGE is working the way it is supposed to.  Whatever value it reads in replaces what was there before.  Even if a missing value is read in, it replaces what was there before.

 

Since you have only one observation per ID in both data sets, try experimenting with it this way:

 

data dataset3;

update dataset1 dataset2;

by id;

run;

 

UPDATE has certain restrictions as to when it can be used, but it ignores missing values.

TMiles
Quartz | Level 8

Obs 1-8 in both datasets are duplicate records,  you will always get quirky results.  You need unique records on your by statement to get a clean merge.  The left most datasets will update the matching variables when the by statement records match, and create new rows when they don't.

 

 

Kurt_Bremser
Super User

Since you have a 1:1 relationship, and identically named variables in both datasets, the dataset named last in the merge statement will be the one that fills the PDV before the output.

To achieve what you want, you might also consider using proc sql and the SQL coalesce function.

mnjtrana
Pyrite | Level 9

In your code above, SAS is working as expected. As both the merging dataset has common variables and you have not put in any condition for merging(Alias) then SAS would overwrite the values already in PDS(came from the left dataset).


Cheers from India!

Manjeet

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1223 views
  • 0 likes
  • 5 in conversation