BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mbennet96
Calcite | Level 5

Hello,

 

I'm on SAS Studio; just started learning SAS. I'm trying to Merge 2 data sets with a BY statement on the variable 'Size', however I'm not sure if I'm doing it correctly. There are uneven observations between the 2 (such as 2 LARGE for set One but 3 LARGE for set Two) so it ends up reusing the previous observation for the one with less (so for Set One Large it shows 19.0, 16.5, and 16.5. Is there something else I should be adding/something I'm doing incorrectly? 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Sounds like you are trying to do a MANY to MANY merge.  When you have a BY group with N observations from one dataset and M observations from the other you will get MAX(N,M) observations out.  The variables that only exist in the dataset with the fewer number of observations for the group will retain the value from the last observation contributed by that dataset, since SAS does not have anything more to read from that dataset to change the values.

 

Note that the same thing happens in a one to many merge, but it is more likely that actually want the values of the variables that only exist in the dataset with only one observation in the group copied onto every resulting observation.

 

What output do you want to get? 

Is there another variable to BY statement so that you no longer have a many to many merge?  So that your merge is either one to one or one to many. 

 

If you want to get N x M observations output instead then use PROC SQL.  

If you the short dataset to stop contributing data you can add a couple of statements to your data step.

data mergedd;
  merge One (RENAME=(Company=Company1 Cost=Cost1))
        Two (RENAME=(Company=Company2 Cost=Cost2))
  ;
  by Size;
  output;
  call missing(of _all_);
run;

View solution in original post

1 REPLY 1
Tom
Super User Tom
Super User

Sounds like you are trying to do a MANY to MANY merge.  When you have a BY group with N observations from one dataset and M observations from the other you will get MAX(N,M) observations out.  The variables that only exist in the dataset with the fewer number of observations for the group will retain the value from the last observation contributed by that dataset, since SAS does not have anything more to read from that dataset to change the values.

 

Note that the same thing happens in a one to many merge, but it is more likely that actually want the values of the variables that only exist in the dataset with only one observation in the group copied onto every resulting observation.

 

What output do you want to get? 

Is there another variable to BY statement so that you no longer have a many to many merge?  So that your merge is either one to one or one to many. 

 

If you want to get N x M observations output instead then use PROC SQL.  

If you the short dataset to stop contributing data you can add a couple of statements to your data step.

data mergedd;
  merge One (RENAME=(Company=Company1 Cost=Cost1))
        Two (RENAME=(Company=Company2 Cost=Cost2))
  ;
  by Size;
  output;
  call missing(of _all_);
run;

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 545 views
  • 1 like
  • 2 in conversation