I'm facing strange behavior of SET statement when reading more than one SAS dataset with different column names. As per documentation, in execution phase variables values are set to missing in PDV. However, it seems that is not a case in the scenario provided below. Would some one able to explain more detailed what is happening?
data M(drop=Height) F(drop=Weight);
set sashelp.class;
if sex='M' then output M;
else if sex='F' then output F;
run;
data result;
set M F;
if age < 13 then do;
Height = 1;
Weight = 1;
end;
run;
I expect to have Height and Weights values =1 only where age < 13. However, Height fills to all rows from M table after first condition was correct and same with Weight from F table. I understand that I could explicitly call missing those columns after SET statement, but why it's not done automatically?
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/basess/n053a58fwk57v7n14h8x7y7u34y4.htm
Hello @karolis_b,
See the more detailed description in section "The Execution Phase" of "How the DATA Step Processes Data" and also sections Default DATA Step Behavior and Redundancy in the documentation of the RETAIN statement.
Since HEIGHT and WEIGHT are read with the SET statement, they are retained. If the dataset currently being read does not supply values for one of these variables, the value assigned by the most recent assignment statement is not overwritten, so it is repeated. There is one detail, though, that I think is missing in the documentation linked above: When SET starts reading from a new dataset (in your example: dataset F), values from the previous dataset (here: dataset M) are not retained (edit: more precisely: they are reset to missing [the actual "retaining" from one iteration of the DATA step to the next does occur; it just occurs before the SET statement is executed]). Otherwise, in your example, William's weight would be carried over into Alice's observation, but this is not the case.
I expect to have Height and Weights values =1 only where age < 13
Why do you expect that? Data set M every observation has values for Weight. Every observation in F has values for Height.
Except for ages less than 13 then the observation will have the value it had from either M or F.
When you use a single SET statement with 2 or more data sets then you have told SAS to "stack" or append the second (and third or fourth or ...) observations after the previous set's values are brought in. The automatic "missing" values are only applied to the observations that did not have the variable in the source set. Otherwise it keeps the value of the variable.
It might help if showed exactly what you expected to see as the output. Then we could show code to replicate that.
Note: multiple SET statements can be even more confusing.
I expect having Height=1 only where age < 13.
At the moment I get as provided below. As I've mentioned, how to achieve the desired result I do know. I'm more interested how come that the values are retained.
Hello @karolis_b,
See the more detailed description in section "The Execution Phase" of "How the DATA Step Processes Data" and also sections Default DATA Step Behavior and Redundancy in the documentation of the RETAIN statement.
Since HEIGHT and WEIGHT are read with the SET statement, they are retained. If the dataset currently being read does not supply values for one of these variables, the value assigned by the most recent assignment statement is not overwritten, so it is repeated. There is one detail, though, that I think is missing in the documentation linked above: When SET starts reading from a new dataset (in your example: dataset F), values from the previous dataset (here: dataset M) are not retained (edit: more precisely: they are reset to missing [the actual "retaining" from one iteration of the DATA step to the next does occur; it just occurs before the SET statement is executed]). Otherwise, in your example, William's weight would be carried over into Alice's observation, but this is not the case.
Thank you, I believe I got confused with this, where I thought all variables in PDV are set to missing before reading the next observation
How the DATA Step Works: A Basic Introduction
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.