Solved: Merging with a BY statement

mbennet96 · Posted 02-27-2020 04:36 PM

Hello,

I'm on SAS Studio; just started learning SAS. I'm trying to Merge 2 data sets with a BY statement on the variable 'Size', however I'm not sure if I'm doing it correctly. There are uneven observations between the 2 (such as 2 LARGE for set One but 3 LARGE for set Two) so it ends up reusing the previous observation for the one with less (so for Set One Large it shows 19.0, 16.5, and 16.5. Is there something else I should be adding/something I'm doing incorrectly?

Tom · Posted 02-27-2020 04:53 PM

Sounds like you are trying to do a MANY to MANY merge. When you have a BY group with N observations from one dataset and M observations from the other you will get MAX(N,M) observations out. The variables that only exist in the dataset with the fewer number of observations for the group will retain the value from the last observation contributed by that dataset, since SAS does not have anything more to read from that dataset to change the values.

Note that the same thing happens in a one to many merge, but it is more likely that actually want the values of the variables that only exist in the dataset with only one observation in the group copied onto every resulting observation.

What output do you want to get?

Is there another variable to BY statement so that you no longer have a many to many merge? So that your merge is either one to one or one to many.

If you want to get N x M observations output instead then use PROC SQL.

If you the short dataset to stop contributing data you can add a couple of statements to your data step.

data mergedd;
  merge One (RENAME=(Company=Company1 Cost=Cost1))
        Two (RENAME=(Company=Company2 Cost=Cost2))
  ;
  by Size;
  output;
  call missing(of _all_);
run;

View solution in original post

Tom · Posted 02-27-2020 04:53 PM

Sounds like you are trying to do a MANY to MANY merge. When you have a BY group with N observations from one dataset and M observations from the other you will get MAX(N,M) observations out. The variables that only exist in the dataset with the fewer number of observations for the group will retain the value from the last observation contributed by that dataset, since SAS does not have anything more to read from that dataset to change the values.

Note that the same thing happens in a one to many merge, but it is more likely that actually want the values of the variables that only exist in the dataset with only one observation in the group copied onto every resulting observation.

What output do you want to get?

Is there another variable to BY statement so that you no longer have a many to many merge? So that your merge is either one to one or one to many.

If you want to get N x M observations output instead then use PROC SQL.

If you the short dataset to stop contributing data you can add a couple of statements to your data step.

data mergedd;
  merge One (RENAME=(Company=Company1 Cost=Cost1))
        Two (RENAME=(Company=Company2 Cost=Cost2))
  ;
  by Size;
  output;
  call missing(of _all_);
run;

Merging with a BY statement

Re: Merging with a BY statement

Re: Merging with a BY statement

Catch up on SAS Innovate 2026