BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jjsingh04
Obsidian | Level 7

Can anyone explain why: 

 

-This is OK:

 

data want;
set have;
if home=. then delete;
run;

-But this is NOT OK:

data want;
set have;
if home=1 then keep;
run;

-Yet this is OK:

data want;
set have;
if home=1;
run;

 

???

Our lives are enriched by the people around us.
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Different rules.

You may not be aware that KEEP is a specific statement, with a companion DROP, that relates to whether specific variables appear in a data set.

 

When you read the LOG it is actually fairly clear:

384  data want;
385  set have;
386  if home=1 then keep;
                    -----
                    180 415
ERROR 180-322: Statement is not valid or it is used out of proper order.

WARNING 415-185: No KEEP variables found, statement is ignored.

387  run;

The underscores in the log are indicating exactly what is not in a valid location, KEEP.

The Warning tells you that Keep expects a list of variables. Also Keep is not actually executable. If you read the documentation at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n1nnrzzsw6rzrjn1p2jfky6pdv23.htm you will see that the Type of statement is Declarative.

 

Also from the documentation:

Executable and Declarative Statements

DATA step statements are executable or declarative statements that can appear in the DATA step. Executable statements result in some action during individual iterations of the DATA step; declarative statements supply information to SAS and take effect when the system compiles program statements.

 

The companion instruction to Delete is Output to write records to the data set at a given point. However that means that once you use an Output only Output statements write to the data set(s). Normally there is an implied "output" at the end of data step code as the only time data is written.

 

The IF <condition>; statement is a special form of the If called a "subsetting if" and means only use records meating the condition. But it does not write the set explicitly.

 

View solution in original post

10 REPLIES 10
ballardw
Super User

Different rules.

You may not be aware that KEEP is a specific statement, with a companion DROP, that relates to whether specific variables appear in a data set.

 

When you read the LOG it is actually fairly clear:

384  data want;
385  set have;
386  if home=1 then keep;
                    -----
                    180 415
ERROR 180-322: Statement is not valid or it is used out of proper order.

WARNING 415-185: No KEEP variables found, statement is ignored.

387  run;

The underscores in the log are indicating exactly what is not in a valid location, KEEP.

The Warning tells you that Keep expects a list of variables. Also Keep is not actually executable. If you read the documentation at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/lestmtsref/n1nnrzzsw6rzrjn1p2jfky6pdv23.htm you will see that the Type of statement is Declarative.

 

Also from the documentation:

Executable and Declarative Statements

DATA step statements are executable or declarative statements that can appear in the DATA step. Executable statements result in some action during individual iterations of the DATA step; declarative statements supply information to SAS and take effect when the system compiles program statements.

 

The companion instruction to Delete is Output to write records to the data set at a given point. However that means that once you use an Output only Output statements write to the data set(s). Normally there is an implied "output" at the end of data step code as the only time data is written.

 

The IF <condition>; statement is a special form of the If called a "subsetting if" and means only use records meating the condition. But it does not write the set explicitly.

 

jjsingh04
Obsidian | Level 7

I think I understand. I had forgotten for a moment there that keep was the opposite of drop (not of delete), and that keep & drop refer to variables (not values). Thanks for bringing both points to my acute attention.

 

From what I now understand of declarative vs. executable, declarative statements (like keep and drop) cannot be conditional, whereas executable statements (like if-then, do-while, etc.) can definitely be conditional.

 

I'm not quite sure what you mean about an output statement meaning all statements must be output only--does that mean you couldn't for example use a drop or keep statement (for a different purpose) afterwards? 

Our lives are enriched by the people around us.
Kurt_Bremser
Super User

@jjsingh04 wrote:

 

I'm not quite sure what you mean about an output statement meaning all statements must be output only--does that mean you couldn't for example use a drop or keep statement (for a different purpose) afterwards? 


If no explicit OUTPUT statement is written into a data step, then the compiler inserts an implicit OUTPUT on its own at the end of the data step iteration. As soon as at least one explicit OUTPUT is present, the implicit one is omitted, and all output operations must be taken care of by the programmer.

 

Since a subsetting IF does an immediate jump to the "top" of the data step (if the condition is not met), it also prevents execution of an implicit OUTPUT.

jjsingh04
Obsidian | Level 7

@Kurt_Bremser,

Is a subsetting IF an explicit OUTPUT statement? I.e. in this case...

data want;
set have;
if home=1;
drop leverage;
run;

Would the resulting dataset no longer have the variable leverage--and only contain all observations where home=1? Or would there be a problem dropping the variable leverage? 

Our lives are enriched by the people around us.
Kurt_Bremser
Super User

An explicit OUTPUT statement has to be written by the programmer. Your step has the "usual" implicit output which the compiler creates on its own because no explicit OUTPUT was given.

For your other questions: create some fake "have" data and try it, then look at the result.

jjsingh04
Obsidian | Level 7

@Kurt_Bremser Thanks for the info

Our lives are enriched by the people around us.
Rick_SAS
SAS Super FREQ

You are confusing statements that are executed for each observation in the data with statements that are executed only once.

 

Most statements are executed for each observation: assignment, logical IF, subsetting IF, ...

 

However, some statements in the DATA step are executed only one time. This includes LENGTH, ARRAY, KEEP, DROP, WHERE,...

jjsingh04
Obsidian | Level 7

@Rick_SAS Thanks for the info

Our lives are enriched by the people around us.
Tom
Super User Tom
Super User

@Rick_SAS wrote:

You are confusing statements that are executed for each observation in the data with statements that are executed only once.

 

Most statements are executed for each observation: assignment, logical IF, subsetting IF, ...

 

However, some statements in the DATA step are executed only one time. This includes LENGTH, ARRAY, KEEP, DROP, WHERE,...


That terminology is just going to confuse things.

Declarative statements are never executed.  All of the statements are "evaluated" when the data step is first seen.  If they are just declarative then they have no role while the data step actually runs.  Only the actual executable statements have any potential for doing something while the data step runs.

 

Some statements, like SET, are really both.  For the SET statement the set of variables that the referenced dataset(s) have impact the set of variables that will exist even if there is no way the executable aspects (reading in the actual data) can ever happen.

Consider this step:

data want;
  stop;
   set sashelp.class;
run;

How many times will the data step iterate?

How many observations will the WANT dataset have?

How many variables?

Spoiler
The set will execute one iteration.  That iteration will immediate finish when it executes the STOP statement. So it will never reach the end and execute the implicit output statement (do not pass GO, do not collect $200)

But the dataset will have all of the same variables as SASHELP.CLASS because when SAS was examining (some say compiling) the data step it saw that SASHELP.CLASS was an input dataset so it added its variables.
jjsingh04
Obsidian | Level 7

@Tom Thanks for the info. I tried running your code, and saw that it created the five variables from sashelp.class in a work.want dataset, but stopped short of populating any observations. The SET statement, combined with the STOP statement in this way, does indeed make for an illustrative example of your point here. With the SET statement, the variables are created (the declarative part)--and with the DATA statement, these variables are put into a new dataset--but with the STOP statement in the way, the observations from the source dataset cannot be recorded into the new dataset (the executable part). 

Our lives are enriched by the people around us.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4722 views
  • 10 likes
  • 5 in conversation