- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have two datasets like shown
i want to merge them by code
i have the code which produces the desired output
can someone explain to me why the code has output prior to the call missing?????and why is the output statement at all necessary???
why is the data from the second dataset not repeated for all values of code from the first???
Thanks
TEMP2
ID CODE LAB
102 45.77 LAB1
102 45.77 LAB2
102 45.77 LAB3
103 45.88 LAB1
103 45.88 LAB2
WANT1
CODE COUNT VALUE1 VALUE2
45.77 1 84 45.39
45.88 2 19 50.00
WANT2
ID CODE LAB COUNT VALUE1 VALUE2
102 45.77 LAB1 1 84 45.39
102 45.77 LAB2 . . .
102 45.77 LAB3 . . .
103 45.88 LAB1 2 19 50.00
103 45.88 LAB2 . . .
CODE
data want2;
merge temp2 want1;
by CODE;
output;
call missing(of _all_);
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I couldn't explain why something was there without knowing who put it there and why
I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE. Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.
The reason it might be there is related to the way MERGE works. MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values. Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE. The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work. They are automatically not set to missing (ie, RETAINed). So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values. Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thwe simplest way is to try running the data step without the output, and then without the call missing.
What you will see is that because of repeated values of Code in temp2, the input from want1 will stay in the PDV, and merged into the repeated rows of temp2.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Thanks for the reply..
I dint undertand it well enough
Regards
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I couldn't explain why something was there without knowing who put it there and why
I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE. Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.
The reason it might be there is related to the way MERGE works. MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values. Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE. The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work. They are automatically not set to missing (ie, RETAINed). So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values. Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended).
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My desire is to have it the way I had it in WANT2
But i dint understand the code??
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
My question is also the same:
Which lines of code is merging the right-hand dataset on only once even if there are duplicates of CODE on the left????
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That was the second half of my answer. In brief, the CALL MISSING is setting the values back to missing, and since MERGE doesn't actually read in the values from the right-hand dataset again, it won't retain the values if they're set to missing. Read a detailed explanation of the PDV and merge statement if you want to understand better, or turn on the data step debugger and run it both ways.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
OK.
If i understand the merge correctly
in a many to one merge the right side values are RETAINEd for all the records on the left by the variable we are merging with????
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
That's the gist of it, yes.