- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I have a question which has confused me for a while. As I know, there are two types of DATA step statements: compile time statements and execution time statements. Compile time statements run prior to execution time statements and KEEPand DROP statement are compile time statement. Please see an example below. Let's say I have a dataset containing a PT column. If the DROP statement runs first in the compile phase, which means PT column would be dropped first and thus the assignment statement cannot be executed. However, why the data step below can run without error?
data want;
set have;
drop PT;
PT2 = PT;
run;
Thank you in advance.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think of it as the variables in the drop statement are flagged for dropping but are dropped only after data processing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The documentation makes this very clear.
The DROP statement applies to all the SAS data sets that are created within the same DATA step and can appear anywhere in the step. The variables in the DROP statement are available for processing in the DATA step.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think of it as the variables in the drop statement are flagged for dropping but are dropped only after data processing.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @unison . if so, it makes sense to me. Many thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Amo89tw, the drop statement is used to instruct SAS which variables are not to be part of the output data set. It does not prevent the variables from being used within the data step if they have already been read in or created in the data step. This is why your variable PT can be used in an assignment and does not cause an error.
Kind regards,
Amir.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, @Amir. Do you mean the assignment statement is actually processed prior to the drop statement? I came across this article (please see the link below). “During this compilation phase, the compiler checks code syntax, sets up the PDV and executes certain statements like KEEP and DROP.” This is why I am curious about what happens behind the scene. Thanks.
https://www.pharmasug.org/proceedings/2015/BB/PharmaSUG-2015-BB15.pdf
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Amo89tw, the assignment statement is not executed before the drop statement.
As the link you provided explains, the drop statement affects the structure (PDV) of the output data set, i.e., which variables are not going to be output, but this doesn't prevent them from being used in the statements in the data step.
Dropping a variable doesn't mean it cannot be used in the data step. Dropping a variable just means it won't appear in the output data set.
Kind regards,
Amir.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Try these versions, with the DROP dataset option:
data want;
set have (drop=PT);
PT2 = PT;
run;
data want (drop=PT);
set have;
PT2 = PT;
run;