Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?
Great finding!
Never occurred to me that this could be the case.
If you are correct, this should be fairly "cheap" for SAS to implement.
I would like to see your test case(s). As they say it didn't happen if there ain't no picture.
I'm not sure that this is the case in general, especially with the BASE engine. But for certain types of data sources (such as from 3rd party databases) it's possible that there is a back-end cost you're seeing.
In recent versions of SAS (esp 9.3 and later), many procs will optimize their table access to push work to the database. But some constructs can break/prevent that.
I agree with @data_null__: share a specific example of what you're seeing.
@Zelazny7 wrote:
Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?
Proc Freq only lists all of the variables if you don't supply them on a TABLES statement.
See the difference between
Proc freq data=sashelp.class;
run;
and
Proc freq data=sashelp.class;
tables sex age;
run;
Ballard, I think what they are saying is that SAS still reads in all of the variables, even if you only put 1 variable in the table statement.
One "design reason" is that many procedures have an OUTPUT statement that enables you to create an output data set that contains ALL the input variables AND the created variables. For example, in PROG REG, if you say
OUTPUT out=MYOUT P=Pred R=Resid;
then the output data set contains all input variables in addition to the predicted and residual variables from the regression.
Because the output from one procedure is often used as the input to another procedure, this prevents doing a separate MERGE between procedure calls.
Furthermore, PROC REG is an interactive procedure, so you can specify a model, then execute the RUN statement. After the model has run, you can specify the OUTPUT statement to get the output data set. In other words, the procedure does not know when it encounters the first RUN statement whether there will be an OUPUT statement later in the program.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.