Help using Base SAS procedures

proc SQL

Reply
Occasional Contributor
Posts: 13

proc SQL

I have a question about efficiency. I am trying to select only one variable from a file. For purposes of illustration, say file1 has 100 variables and file 2 has 1. Would it considerably matter in terms of time whether I use a data statement in conjunction with PROC SQL which selects that variable from file2 or just to go ahead and use file1 anyway and just simply select my variable of choice ? In short, is there a any saving in terms of memory and time between the two:

version 1:
------------
data file2;
set file1(keep=var1);
run;

proc sql;
create table xxx as
select var1
from file2;
quit;

OR

version 2:
------------

proc sql;
create table xxx as
select var1
from file1;
quit;

Thanks
Trusted Advisor
Posts: 2,115

Re: proc SQL

Posted in reply to Elkridge_SAS
version 2 will often be quicker than version 1 as the data are only passed once.

Version 1 can be simplified to

DATA xxx;
SET file1(KEEP=var1);
RUN;

and would then be about the same as version 2.
Occasional Contributor
Posts: 13

Re: proc SQL

> version 2 will often be quicker than version 1 as the
> data are only passed once.
>
> Version 1 can be simplified to
>
> DATA xxx;
> SET file1(KEEP=var1);
> RUN;
>
> and would then be about the same as version 2.

Thanks for replying. It just occurred to me that I might have made my problem seem simpler than it really is. The intended proc sql in version 2 should not be as trivial as I made it look. It should involve a join of either file1 or file2 with say file3 & file4 so your simplification of version 1 above will certainly not work alone except with an accompanying merge. I think my question boils down to "Is it quicker in PROC SQL to select the same 2 variables from a file with 100 variables or from one with 5 variables or is the size immaterial ?"
Trusted Advisor
Posts: 2,115

Re: proc SQL

Posted in reply to Elkridge_SAS
The difference is measurable, but may not be noticeable. In both the data step and SQL, SAS has to read the entire program data vector (PDV) in order to pick off the variables of interest (search the documentation for "PDV"). So, with a hundred observations, one will hardly notice the difference between a dataset with 5 variables and one with 100. With a million, you definitely will notice the difference. When your data start getting "large", you also need to start looking at the LENGTH of each variable; that can also make a difference.

Search for on support.sas.com for a lot of SUGI/SGF articles that examine this question more formally.
Ask a Question
Discussion stats
  • 3 replies
  • 149 views
  • 0 likes
  • 2 in conversation