Solved: call missing

robertrao · Posted 11-12-2013 04:01 PM

Hi,

I have two datasets like shown

i want to merge them by code

i have the code which produces the desired output

can someone explain to me why the code has output prior to the call missing?????and why is the output statement at all necessary???

why is the data from the second dataset not repeated for all values of code from the first???

Thanks

TEMP2

ID CODE LAB

102 45.77 LAB1

102 45.77 LAB2

102 45.77 LAB3

103 45.88 LAB1

103 45.88 LAB2

WANT1

CODE COUNT VALUE1 VALUE2

45.77 1 84 45.39

45.88 2 19 50.00

WANT2

ID CODE LAB COUNT VALUE1 VALUE2

102 45.77 LAB1 1 84 45.39

102 45.77 LAB2 . . .

102 45.77 LAB3 . . .

103 45.88 LAB1 2 19 50.00

103 45.88 LAB2 . . .

CODE

data want2;

merge temp2 want1;

by CODE;

output;

call missing(of _all_);

run;

snoopy369 · Posted 11-12-2013 04:13 PM

I couldn't explain why something was there without knowing who put it there and why

I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE. Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.

The reason it might be there is related to the way MERGE works. MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values. Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE. The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work. They are automatically not set to missing (ie, RETAINed). So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values. Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended).

View solution in original post

LinusH · Posted 11-12-2013 04:09 PM

Thwe simplest way is to try running the data step without the output, and then without the call missing.

What you will see is that because of repeated values of Code in temp2, the input from want1 will stay in the PDV, and merged into the repeated rows of temp2.

Data never sleeps

robertrao · Posted 11-12-2013 04:16 PM

Hi,

Thanks for the reply..

I dint undertand it well enough

Regards

snoopy369 · Posted 11-12-2013 04:13 PM

I couldn't explain why something was there without knowing who put it there and why

I would guess it's because the merge was specifically written to have the desired effect, namely, to merge the right-hand dataset on only once even if there are duplicates of CODE. Take away the call missing and the output, and you get it merged to all same-values of CODE, if that's your desire.

The reason it might be there is related to the way MERGE works. MERGE identifies the first matching observation on the left with the observation on the right, and appends the right dataset's values. Then it goes on through both datasets until it finds the next matching set of observations - it doesn't keep merging on the right-side dataset's values for the same value of CODE. The reason the effect is (normally) to do so, is a function of how variables that come in on a MERGE/SET/UPDATE statement work. They are automatically not set to missing (ie, RETAINed). So the MERGE basically just outputs all of the left-side dataset rows, and they all keep the right-side dataset's values. Nothing is set to missing until you pass a by group boundary (where, if a right-hand observation is missing, missing values are appended).

robertrao · Posted 11-12-2013 04:21 PM

My desire is to have it the way I had it in WANT2

But i dint understand the code??

Thanks

robertrao · Posted 11-12-2013 04:46 PM

Hi,

My question is also the same:

Which lines of code is merging the right-hand dataset on only once even if there are duplicates of CODE on the left????

Thanks

snoopy369 · Posted 11-12-2013 04:48 PM

That was the second half of my answer. In brief, the CALL MISSING is setting the values back to missing, and since MERGE doesn't actually read in the values from the right-hand dataset again, it won't retain the values if they're set to missing. Read a detailed explanation of the PDV and merge statement if you want to understand better, or turn on the data step debugger and run it both ways.

robertrao · Posted 11-12-2013 05:06 PM

OK.

If i understand the merge correctly

in a many to one merge the right side values are RETAINEd for all the records on the left by the variable we are merging with????

Thanks

snoopy369 · Posted 11-12-2013 05:12 PM

That's the gist of it, yes.

call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Re: call missing

Classroom Training Available!