BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

Hi,

I have two datasets like shown

i want to merge them by code

i have the code which produces the desired output

can someone explain to me why the code has output prior to the call missing?????and why is the output statement at all necessary???

why is the data from the second dataset not repeated for all values of code from the first???

Thanks

TEMP2

ID   CODE     LAB

102  45.77    LAB1

102  45.77   LAB2

102  45.77   LAB3

103  45.88   LAB1

103  45.88  LAB2

WANT1

CODE  COUNT  VALUE1   VALUE2

45.77       1            84            45.39

45.88       2            19            50.00   

WANT2

ID   CODE     LAB     COUNT  VALUE1   VALUE2

102 45.77    LAB1        1                 84            45.39

102  45.77   LAB2        .                   .               .

102  45.77   LAB3       .                    .               .

103  45.88   LAB1         2               19            50.00  

103  45.88  LAB2           .                    .               .

    

CODE

data want2;

merge temp2 want1;

  by CODE;

output;

call missing(of _all_);

run;

1 ACCEPTED SOLUTION

Accepted Solutions
snoopy369
Barite | Level 11

I couldn't explain why something was there without knowing who put it there and why Smiley Happy

I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE.  Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.

The reason it might be there is related to the way MERGE works.  MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values.  Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE.  The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work.  They are automatically not set to missing (ie, RETAINed).  So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values.  Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended). 

View solution in original post

8 REPLIES 8
LinusH
Tourmaline | Level 20

Thwe simplest way is to try running the data step without the output, and then without the call missing.

What you will see is that because of repeated values of Code in temp2, the input from want1 will stay in the PDV, and merged into the repeated rows of temp2.

Data never sleeps
robertrao
Quartz | Level 8

Hi,

Thanks for the reply..

I dint undertand it well enough

Regards

snoopy369
Barite | Level 11

I couldn't explain why something was there without knowing who put it there and why Smiley Happy

I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE.  Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.

The reason it might be there is related to the way MERGE works.  MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values.  Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE.  The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work.  They are automatically not set to missing (ie, RETAINed).  So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values.  Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended). 

robertrao
Quartz | Level 8

My desire is to have it the way I had it in WANT2

But i dint understand the code??

Thanks

robertrao
Quartz | Level 8

Hi,

My question is also the same:

Which lines of code is merging the right-hand dataset on only once even if there are duplicates of CODE on the left????

Thanks

snoopy369
Barite | Level 11

That was the second half of my answer.  In brief, the CALL MISSING is setting the values back to missing, and since MERGE doesn't actually read in the values from the right-hand dataset again, it won't retain the values if they're set to missing.  Read a detailed explanation of the PDV and merge statement if you want to understand better, or turn on the data step debugger and run it both ways.

robertrao
Quartz | Level 8

OK.

If i understand the merge correctly

in a many to one merge the right side values are RETAINEd for all the records on the left by the variable we are merging with????

Thanks

snoopy369
Barite | Level 11

That's the gist of it, yes.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 3579 views
  • 3 likes
  • 3 in conversation