Importance of statement order in a step

jaliu · Posted 09-02-2021 09:23 AM

Hello, something I'm having trouble understanding is when order matters. Please help verify, correct or add feedback to help my understanding.

WHERE: not sure about this one. it seems like we can't use where on newly created variables, so in that sense it doesn't matter because variables get flagged and dropped during execution? or does order make the processing more efficient?

FORMAT: does not matter where the statement is used because it does not actually alter the data itself but changes the way it's presented? But what if we start summarizing or printing reports on formatted data? Whether it's something as simple as rounding or adding a dollar sign to something more transformational like subsetting ranges of values into categorial groups, does this matter? Or is this why data steps are separated from proc steps?

DROP/KEEP: does not matter the order because variables are simply flagged and dropped upon execution. But if we use these as options during a set statement then it could have an effect.

For length, I get that we must assign the variable length before the variable is encountered in order to prevent truncation of length for other values, but what I don't understand is why if we add the length statement after assignment, why in some cases the step will still run while in others it won't. For example, a length statement is added after a character variable is assigned, and if the specification is numeric then there is an error but if the specification is character then there is no error but of course nothing happens.

Any other tips? Thank you.

Tom · Posted 09-02-2021 10:07 AM

In general it looks like you understand.

The main distinction is between executable and non-executable statements. That is when the data step is running do those statements take an action when the execution reaches that point in the step. In general for non-executable statements you can place them anywhere.

But the point you mention about LENGTH is also valid for FORMAT/INFORMAT and really almost any statement that references a variable (assignment, if, etc.). When SAS has to define the type and length for a variable it does it based on the information it has seen so far while compiling the data step. So if the first place you reference the variable is an assignment statement SAS will set the type and length based on how it sees the variable being used in that statement. Same for FORMAT or INFORMAT. It will guess the type to use based on the type of FORMAT or INFORMAT being attached. It will guess the length for character variables for based on the width used in the format or informat specification that is being attached.

Another thing where order can have an impact is which variables will be found when you use variable lists (first--last prefix: _all_ _numeric_ _character_ etc.). So while the location of DROP / KEEP does not matter if you are explicitly listing the variables it could matter if you use a variable list since that will be expanded to the variables that the compiler currently knows about. If later statements reference new variables that could have been part of the variable list the compiler is not going to loop back and change that earlier variable list it has already evaluated.

Importance of statement order in a step

Re: Importance of statement order in a step

SAS Innovate 2025: Save the Date