BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
karolis_b
Fluorite | Level 6

I'm facing strange behavior of SET statement when reading more than one SAS dataset with different column names. As per documentation, in execution phase variables values are set to missing in PDV. However, it seems that is not a case in the scenario provided below. Would some one able to explain more detailed what is happening?

data M(drop=Height) F(drop=Weight);
set sashelp.class;
if sex='M' then output M;
else if sex='F' then output F;
run;

data result;
set M F;
if age < 13 then do;
   Height = 1;
   Weight = 1;
end;
run;

I expect to have Height and Weights values =1 only where age < 13. However, Height fills to all rows from M table after first condition was correct and same with Weight from F table. I understand that I could explicitly call missing those columns after SET statement, but why it's not done automatically?

karolis_b_0-1712304286268.png

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/basess/n053a58fwk57v7n14h8x7y7u34y4.htm

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @karolis_b,

 

See the more detailed description in section "The Execution Phase" of "How the DATA Step Processes Data" and also sections Default DATA Step Behavior and Redundancy in the documentation of the RETAIN statement.

 

Since HEIGHT and WEIGHT are read with the SET statement, they are retained. If the dataset currently being read does not supply values for one of these variables, the value assigned by the most recent assignment statement is not overwritten, so it is repeated. There is one detail, though, that I think is missing in the documentation linked above: When SET starts reading from a new dataset (in your example: dataset F), values from the previous dataset (here: dataset M) are not retained (edit: more precisely: they are reset to missing [the actual "retaining" from one iteration of the DATA step to the next does occur; it just occurs before the SET statement is executed]). Otherwise, in your example, William's weight would be carried over into Alice's observation, but this is not the case.

View solution in original post

6 REPLIES 6
karolis_b
Fluorite | Level 6
it seems that PDV is created with all variables from SET statement datasets, but only those are set missing, which are available at the pointer. So if the program reads observation from table M, only variables from M table are set to missing.
Ksharp
Super User
ANY variable from dataset would be retained .

althought 'height' not in M dataset, but in F dataset, therefore 'height' would be retained, and you would see "Height fills to all rows from M table after first condition was correct".

Same thing happen to variable 'weight'.
ballardw
Super User

I expect to have Height and Weights values =1 only where age < 13

 

Why  do you expect that? Data set M every observation has values for Weight. Every observation in F has values for Height.

Except for ages less than 13 then the observation will have the value it had from either M or F.

When you use a single SET statement with 2 or more data sets then you have told SAS to "stack" or append the second (and third or fourth or ...) observations after the previous set's values are brought in. The automatic "missing" values are only applied to the observations that did not have the variable in the source set. Otherwise it keeps the value of the variable.

 

It might help if showed exactly what you expected to see as the output. Then we could show code to replicate that.

 

Note: multiple SET statements can be even more confusing.

 

karolis_b
Fluorite | Level 6

I expect having Height=1 only where age < 13. 

At the moment I get as provided below. As I've mentioned, how to achieve the desired result I do know. I'm more interested how come that the values are retained.

karolis_b_0-1712307844604.png

 

FreelanceReinh
Jade | Level 19

Hello @karolis_b,

 

See the more detailed description in section "The Execution Phase" of "How the DATA Step Processes Data" and also sections Default DATA Step Behavior and Redundancy in the documentation of the RETAIN statement.

 

Since HEIGHT and WEIGHT are read with the SET statement, they are retained. If the dataset currently being read does not supply values for one of these variables, the value assigned by the most recent assignment statement is not overwritten, so it is repeated. There is one detail, though, that I think is missing in the documentation linked above: When SET starts reading from a new dataset (in your example: dataset F), values from the previous dataset (here: dataset M) are not retained (edit: more precisely: they are reset to missing [the actual "retaining" from one iteration of the DATA step to the next does occur; it just occurs before the SET statement is executed]). Otherwise, in your example, William's weight would be carried over into Alice's observation, but this is not the case.

karolis_b
Fluorite | Level 6

Thank you, I believe I got confused with this, where I thought all variables in PDV are set to missing before reading the next observation

How the DATA Step Works: A Basic Introduction 

karolis_b_0-1712308452477.png

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1429 views
  • 2 likes
  • 4 in conversation