Hi,
Can someone explain why SAS is doing what it is doing here:
data df1; input var1 $ var2; datalines; A 4 A 5 B 4 B 5 ; run; data df2; input var1 $ var3; datalines; A 10 B 20 ; run; data df3; merge df1 df2; by var1; if var2 = 4 then var3 = 30; run;
To get what I want I need to split the merge and if statements:
data df4; merge df1 df2; by var1; run; data df4; set df4; if var2 = 4 then var3 = 30; run;
I'm just not getting how SAS is operating when calculating df3.
Thanks,
Can you explain what you are trying to do?
Your first two steps create datasets. (although they both have an extra RUN statement after them that will do nothing.)
Your first MERGE is combining the observations by VAR1. Since DF2 ("df" is that short for "dataframe"? Note that in SAS these are called datasets.) only has one observation per value of VAR1 they are read into the merge step only once. If you change the value of VAR3 in the data step then there is no way for the value to be changed back.
Run that step checking for VAR2=5 instead.
data ds1;
input var1 $ var2;
datalines;
A 4
A 5
A 6
B 4
B 5
B 6
;
data ds2;
input var1 $ var3;
datalines;
A 10
B 20
;
data ds3;
merge ds1 ds2;
by var1;
if var2 = 5 then var3 = 30;
run;
Result
OBS var1 var2 var3 1 A 4 10 2 A 5 30 3 A 6 30 4 B 4 20 5 B 5 30 6 B 6 30
If you want to remember the old value of VAR3 then make a new variable instead. You can always add a RENAME statement so the names used when writing the DS3 will be different.
data ds3;
merge ds1 ds2;
by var1;
if var2 = 5 then newvar3 = 30;
else newvar3=var3;
rename var3=oldvar3 newvar3=var3;
run;
Result
OBS var1 var2 oldvar3 var3 1 A 4 10 10 2 A 5 10 30 3 A 6 10 10 4 B 4 20 20 5 B 5 20 30 6 B 6 20 20
Can you explain what you are trying to do?
Your first two steps create datasets. (although they both have an extra RUN statement after them that will do nothing.)
Your first MERGE is combining the observations by VAR1. Since DF2 ("df" is that short for "dataframe"? Note that in SAS these are called datasets.) only has one observation per value of VAR1 they are read into the merge step only once. If you change the value of VAR3 in the data step then there is no way for the value to be changed back.
Run that step checking for VAR2=5 instead.
data ds1;
input var1 $ var2;
datalines;
A 4
A 5
A 6
B 4
B 5
B 6
;
data ds2;
input var1 $ var3;
datalines;
A 10
B 20
;
data ds3;
merge ds1 ds2;
by var1;
if var2 = 5 then var3 = 30;
run;
Result
OBS var1 var2 var3 1 A 4 10 2 A 5 30 3 A 6 30 4 B 4 20 5 B 5 30 6 B 6 30
If you want to remember the old value of VAR3 then make a new variable instead. You can always add a RENAME statement so the names used when writing the DS3 will be different.
data ds3;
merge ds1 ds2;
by var1;
if var2 = 5 then newvar3 = 30;
else newvar3=var3;
rename var3=oldvar3 newvar3=var3;
run;
Result
OBS var1 var2 oldvar3 var3 1 A 4 10 10 2 A 5 10 30 3 A 6 10 10 4 B 4 20 20 5 B 5 20 30 6 B 6 20 20
Sorry about the code quality, I rarely create datasets from hard coded inputs and I've just been conditioned that a "data" step needs a "run". And yes, I guess my tables should be named dtx 😛
I wanted to merge datasets, then have the if statement apply to the post-merged datasets. SAS (as you point out) does not complete the merge until the 'run' line, and over writes the merged values affected by if for all subsequent rows until the merge value changes.
But thank you for the example & explanation. I get how SAS got the result it got.
Thanks again.
@python_user wrote:
SAS (as you point out) does not complete the merge until the 'run' line, and over writes the merged values affected by if for all subsequent rows until the merge value changes.
Actually, the "overwriting" takes place only in the observation where the conditional assignment statement is executed. In subsequent observations of the same BY group the value is just retained (i.e. repeated). Overwriting is also what makes your final DATA step (using the SET statement) work.
The key fact is that
"The variables read using the MERGE statement are retained in the PDV." (documentation of the MERGE statement)
In the DATA step creating dataset DF3 the MERGE statement reads var3=10 from dataset DF2 while processing the first observation of the BY group var1='A'. Dataset DF3, having only one observation with var1='A', does not contribute a new value of var3 for the second observation of that BY group. Due to the implied RETAIN, the value of var3 from the previous observation is repeated. While this is the desired behavior in the absence of the IF-THEN statement, the repetition of value 30 introduced by the conditional assignment statement seems to contradict the IF condition. But the issue is that the original value 10 has been overwritten by the assignment statement var3=30 and hence is no longer available. (Analogous behavior in the second BY group.)
This is even similar in the DATA step updating dataset DF4. The SET statement also implies a RETAIN for var1, var2 and var3, but this time the retained value var3=30 from the first observation with var1='A' is overwritten by the value var3=10 read from the second observation with var1='A' in dataset DF4.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.