BookmarkSubscribeRSS Feed
Elkridge_SAS
Calcite | Level 5
I have a question about efficiency. I am trying to select only one variable from a file. For purposes of illustration, say file1 has 100 variables and file 2 has 1. Would it considerably matter in terms of time whether I use a data statement in conjunction with PROC SQL which selects that variable from file2 or just to go ahead and use file1 anyway and just simply select my variable of choice ? In short, is there a any saving in terms of memory and time between the two:

version 1:
------------
data file2;
set file1(keep=var1);
run;

proc sql;
create table xxx as
select var1
from file2;
quit;

OR

version 2:
------------

proc sql;
create table xxx as
select var1
from file1;
quit;

Thanks
3 REPLIES 3
Doc_Duke
Rhodochrosite | Level 12
version 2 will often be quicker than version 1 as the data are only passed once.

Version 1 can be simplified to

DATA xxx;
SET file1(KEEP=var1);
RUN;

and would then be about the same as version 2.
Elkridge_SAS
Calcite | Level 5
> version 2 will often be quicker than version 1 as the
> data are only passed once.
>
> Version 1 can be simplified to
>
> DATA xxx;
> SET file1(KEEP=var1);
> RUN;
>
> and would then be about the same as version 2.

Thanks for replying. It just occurred to me that I might have made my problem seem simpler than it really is. The intended proc sql in version 2 should not be as trivial as I made it look. It should involve a join of either file1 or file2 with say file3 & file4 so your simplification of version 1 above will certainly not work alone except with an accompanying merge. I think my question boils down to "Is it quicker in PROC SQL to select the same 2 variables from a file with 100 variables or from one with 5 variables or is the size immaterial ?"
Doc_Duke
Rhodochrosite | Level 12
The difference is measurable, but may not be noticeable. In both the data step and SQL, SAS has to read the entire program data vector (PDV) in order to pick off the variables of interest (search the documentation for "PDV"). So, with a hundred observations, one will hardly notice the difference between a dataset with 5 variables and one with 100. With a million, you definitely will notice the difference. When your data start getting "large", you also need to start looking at the LENGTH of each variable; that can also make a difference.

Search for on support.sas.com for a lot of SUGI/SGF articles that examine this question more formally.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1388 views
  • 0 likes
  • 2 in conversation