data pd_score; merge pd_score(in=a) long_run_pd; by segment_id; if pd>long_run_pd then long_run_pd=pd; if a; run;
in the above, if segment_id is the same, pd is different in pd_score and long_run_pd, which pd do we get in the resulting table in the merging process?
also the if then statement, does it matter if it is after the statement if a or before it ...
I was also looking at the following merge:
data long3;
merge long_run_pd pd_score;
by segment_id sampdate;
run;
here I am sure if segment_id, sampdate same, it will take pd from sampdate which we can think of as b
but after this I cant figure out how it works for the other merge with if A anymore...
Hi to help you better would recommend you go through the following link:
Merge with Caution by Joshua M. Horstman
However
data pd_score; merge pd_score(in=a) long_run_pd; by segment_id; if pd>long_run_pd then long_run_pd=pd; if a; run;
In this merge, if segment_id is the same but pd is different in pd_score and long_run_pd, the pd value from pd_score will be used in the resulting table. This is because pd_score is listed first in the merge statement, so its values have precedence.
The if pd>long_run_pd then long_run_pd=pd; statement is updating long_run_pd to be equal to pd if pd is greater than long_run_pd.
The if a; statement is a subsetting if statement. It means that only the observations that come from the pd_score dataset (where a is true) will be output to the resulting dataset.
I hope this helps!
The answer is actually more complex than you would think. It depends in part on whether you have a one-to-one match or a many-to-one match. If you are interested in reading and learning, here is a link to an old paper than explains the process:
https://www.lexjansen.com/nesug/nesug99/ad/ad155.pdf
@HeatherNewton wrote:
If one to one, which one does it take? I thought would be b
Please read some of the very good papers already shared with you that explain all of this.
See for example 6. Overlapping variables from Merge with Caution
@HeatherNewton wrote:
If one to one, which one does it take? I thought would be b
No. It value will reflect the last observation read in.
So in a 1 to 1 match if both A and B contribute the the value from B is read last.
But if there that value of KEY variables do not exist in B then the value from A stays since nothing was read to overwrite it.
Creating small sample programs to test what's happening often help to clarify things.
data a;
id=1;var='a1';output;
id=2;var='a2';output;
run;
data b;
id=1;var='b1';output;
id=1;var='b1';output;
id=3;var='b3';output;
run;
data test;
merge a b;
by id;
run;
proc print data=test;
run;
You could do something like this, which would make it very clear which PD is which
data pd_score;
merge pd_score(in=a rename=(pd=pd_from_pd_score)) long_run_pd(rename=(pd=pd_from_long_run));
by segment_id;
/* More data step commands as necessary */
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.