Why doesn't SAS keep only the variables needed for certain procedures?

Zelazny7 · Posted 01-15-2016 09:03 AM

Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?

LinusH · Posted 01-15-2016 09:40 AM

Great finding!

Never occurred to me that this could be the case.

If you are correct, this should be fairly "cheap" for SAS to implement.

Data never sleeps

data_null__ · Posted 01-15-2016 09:45 AM

I would like to see your test case(s). As they say it didn't happen if there ain't no picture.

ChrisHemedinger · Posted 01-15-2016 10:22 AM

I'm not sure that this is the case in general, especially with the BASE engine. But for certain types of data sources (such as from 3rd party databases) it's possible that there is a back-end cost you're seeing.

In recent versions of SAS (esp 9.3 and later), many procs will optimize their table access to push work to the database. But some constructs can break/prevent that.

I agree with @data_null__: share a specific example of what you're seeing.

SAS For Dummies 3rd Edition! Check out the new edition, covering SAS 9.4, SAS Viya, and all of the modern ways to use SAS!

ballardw · Posted 01-15-2016 10:54 AM

@Zelazny7 wrote:

Is there a design reason that SAS does not create an implicit keep statement for procedures that specify variables? For example, the proc freq tables statement lists all the variables used in the procedure. Yet there is a significant speedup when I add a keep statement specifying those variables. Why doesn't SAS implicitly create this keep statment?

Proc Freq only lists all of the variables if you don't supply them on a TABLES statement.

See the difference between

Proc freq data=sashelp.class;

run;

and

Proc freq data=sashelp.class;

tables sex age;

run;

Steelers_In_DC · Posted 01-15-2016 01:20 PM

Ballard, I think what they are saying is that SAS still reads in all of the variables, even if you only put 1 variable in the table statement.

Rick_SAS · Posted 01-15-2016 01:40 PM

One "design reason" is that many procedures have an OUTPUT statement that enables you to create an output data set that contains ALL the input variables AND the created variables. For example, in PROG REG, if you say

OUTPUT out=MYOUT P=Pred R=Resid;

then the output data set contains all input variables in addition to the predicted and residual variables from the regression.

Because the output from one procedure is often used as the input to another procedure, this prevents doing a separate MERGE between procedure calls.

Furthermore, PROC REG is an interactive procedure, so you can specify a model, then execute the RUN statement. After the model has run, you can specify the OUTPUT statement to get the output data set. In other words, the procedure does not know when it encounters the first RUN statement whether there will be an OUPUT statement later in the program.

Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Re: Why doesn't SAS keep only the variables needed for certain procedures?

Registration is open

SAS Training: Just a Click Away