BookmarkSubscribeRSS Feed
Zelazny7
Fluorite | Level 6

Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?

 

 

6 REPLIES 6
LinusH
Tourmaline | Level 20

Great finding!

Never occurred to me that this could be the case.

If you are correct, this should be fairly "cheap" for SAS to implement.

Data never sleeps
data_null__
Jade | Level 19

I would like to see your test case(s).  As they say it didn't happen if there ain't no picture.

ChrisHemedinger
Community Manager

I'm not sure that this is the case in general, especially with the BASE engine.  But for certain types of data sources (such as from 3rd party databases) it's possible that there is a back-end cost you're seeing.

 

In recent versions of SAS (esp 9.3 and later), many procs will optimize their table access to push work to the database.  But some constructs can break/prevent that.

 

I agree with @data_null__: share a specific example of what you're seeing.

Learn from the Experts! Check out the huge catalog of free sessions in the Ask the Expert webinar series.
ballardw
Super User

@Zelazny7 wrote:

Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?

 

 


Proc Freq only lists all of the variables if you don't supply them on a TABLES statement.

See the difference between

Proc freq data=sashelp.class;

run;

 

and

Proc freq data=sashelp.class;

   tables sex age;

run;

Steelers_In_DC
Barite | Level 11

Ballard, I think what they are saying is that SAS still reads in all of the variables, even if you only put 1 variable in the table statement.

Rick_SAS
SAS Super FREQ

One "design reason" is that many procedures have an OUTPUT statement that enables you to create an output data set that contains ALL the input variables AND the created variables. For example, in PROG REG, if you say

OUTPUT out=MYOUT P=Pred R=Resid;

then the output data set contains all input variables in addition to the predicted and residual variables from the regression.

Because the output from one procedure is often used as the input to another procedure, this prevents doing a separate MERGE between procedure calls.

 

Furthermore, PROC REG is an interactive procedure, so you can specify a model, then execute the RUN statement.  After the model has run, you can specify the OUTPUT statement to get the output data set.  In other words, the procedure does not know when it encounters the first RUN statement whether there will be an OUPUT statement later in the program.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1532 views
  • 2 likes
  • 7 in conversation