Can anyone explain why:
-This is OK:
data want;
set have;
if home=. then delete;
run;-But this is NOT OK:
data want;
set have;
if home=1 then keep;
run;-Yet this is OK:
data want;
set have;
if home=1;
run;
???
Different rules.
You may not be aware that KEEP is a specific statement, with a companion DROP, that relates to whether specific variables appear in a data set.
When you read the LOG it is actually fairly clear:
384  data want;
385  set have;
386  if home=1 then keep;
                    -----
                    180 415
ERROR 180-322: Statement is not valid or it is used out of proper order.
WARNING 415-185: No KEEP variables found, statement is ignored.
387  run;
The underscores in the log are indicating exactly what is not in a valid location, KEEP.
The Warning tells you that Keep expects a list of variables. Also Keep is not actually executable. If you read the documentation at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n1nnrzzsw6rzrjn1p2jfky6pdv23.htm you will see that the Type of statement is Declarative.
Also from the documentation:
Executable and Declarative Statements
DATA step statements are executable or declarative statements that can appear in the DATA step. Executable statements result in some action during individual iterations of the DATA step; declarative statements supply information to SAS and take effect when the system compiles program statements.
The companion instruction to Delete is Output to write records to the data set at a given point. However that means that once you use an Output only Output statements write to the data set(s). Normally there is an implied "output" at the end of data step code as the only time data is written.
The IF <condition>; statement is a special form of the If called a "subsetting if" and means only use records meating the condition. But it does not write the set explicitly.
Different rules.
You may not be aware that KEEP is a specific statement, with a companion DROP, that relates to whether specific variables appear in a data set.
When you read the LOG it is actually fairly clear:
384  data want;
385  set have;
386  if home=1 then keep;
                    -----
                    180 415
ERROR 180-322: Statement is not valid or it is used out of proper order.
WARNING 415-185: No KEEP variables found, statement is ignored.
387  run;
The underscores in the log are indicating exactly what is not in a valid location, KEEP.
The Warning tells you that Keep expects a list of variables. Also Keep is not actually executable. If you read the documentation at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n1nnrzzsw6rzrjn1p2jfky6pdv23.htm you will see that the Type of statement is Declarative.
Also from the documentation:
Executable and Declarative Statements
DATA step statements are executable or declarative statements that can appear in the DATA step. Executable statements result in some action during individual iterations of the DATA step; declarative statements supply information to SAS and take effect when the system compiles program statements.
The companion instruction to Delete is Output to write records to the data set at a given point. However that means that once you use an Output only Output statements write to the data set(s). Normally there is an implied "output" at the end of data step code as the only time data is written.
The IF <condition>; statement is a special form of the If called a "subsetting if" and means only use records meating the condition. But it does not write the set explicitly.
I think I understand. I had forgotten for a moment there that keep was the opposite of drop (not of delete), and that keep & drop refer to variables (not values). Thanks for bringing both points to my acute attention.
From what I now understand of declarative vs. executable, declarative statements (like keep and drop) cannot be conditional, whereas executable statements (like if-then, do-while, etc.) can definitely be conditional.
I'm not quite sure what you mean about an output statement meaning all statements must be output only--does that mean you couldn't for example use a drop or keep statement (for a different purpose) afterwards?
@jjsingh04 wrote:
I'm not quite sure what you mean about an output statement meaning all statements must be output only--does that mean you couldn't for example use a drop or keep statement (for a different purpose) afterwards?
If no explicit OUTPUT statement is written into a data step, then the compiler inserts an implicit OUTPUT on its own at the end of the data step iteration. As soon as at least one explicit OUTPUT is present, the implicit one is omitted, and all output operations must be taken care of by the programmer.
Since a subsetting IF does an immediate jump to the "top" of the data step (if the condition is not met), it also prevents execution of an implicit OUTPUT.
Is a subsetting IF an explicit OUTPUT statement? I.e. in this case...
data want;
set have;
if home=1;
drop leverage;
run;
Would the resulting dataset no longer have the variable leverage--and only contain all observations where home=1? Or would there be a problem dropping the variable leverage?
An explicit OUTPUT statement has to be written by the programmer. Your step has the "usual" implicit output which the compiler creates on its own because no explicit OUTPUT was given.
For your other questions: create some fake "have" data and try it, then look at the result.
@Kurt_Bremser Thanks for the info
You are confusing statements that are executed for each observation in the data with statements that are executed only once.
Most statements are executed for each observation: assignment, logical IF, subsetting IF, ...
However, some statements in the DATA step are executed only one time. This includes LENGTH, ARRAY, KEEP, DROP, WHERE,...
@Rick_SAS Thanks for the info
@Rick_SAS wrote:
You are confusing statements that are executed for each observation in the data with statements that are executed only once.
Most statements are executed for each observation: assignment, logical IF, subsetting IF, ...
However, some statements in the DATA step are executed only one time. This includes LENGTH, ARRAY, KEEP, DROP, WHERE,...
That terminology is just going to confuse things.
Declarative statements are never executed. All of the statements are "evaluated" when the data step is first seen. If they are just declarative then they have no role while the data step actually runs. Only the actual executable statements have any potential for doing something while the data step runs.
Some statements, like SET, are really both. For the SET statement the set of variables that the referenced dataset(s) have impact the set of variables that will exist even if there is no way the executable aspects (reading in the actual data) can ever happen.
Consider this step:
data want;
  stop;
   set sashelp.class;
run;How many times will the data step iterate?
How many observations will the WANT dataset have?
How many variables?
@Tom Thanks for the info. I tried running your code, and saw that it created the five variables from sashelp.class in a work.want dataset, but stopped short of populating any observations. The SET statement, combined with the STOP statement in this way, does indeed make for an illustrative example of your point here. With the SET statement, the variables are created (the declarative part)--and with the DATA statement, these variables are put into a new dataset--but with the STOP statement in the way, the observations from the source dataset cannot be recorded into the new dataset (the executable part).
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
