I noticed that I can use data step options inside a proc SQL for example
create table countries2 as select * from sql.countries(drop=UNDate);
Now in the case of the WHERE option I'm wondering what's the difference between this
PROC SQL; CREATE TABLE sql_where AS SELECT * FROM essentia.olympic WHERE year = 2012; QUIT;
and this
PROC SQL; CREATE TABLE sql_where (WHERE= (year=2012)) AS SELECT * FROM essentia.olympic; QUIT;
the resulting sets are the same so I'm thinking the difference is in efficency. Which form is better?
Functionally they may be the same, however I would recommend not using dataset options in SQL. There are various reasons, the two are separate languages, so using one in the other would confuse those people familiar with SQL. Also, the code no longer becomes portable - i.e. you cant copy th code out to a pure SQL processor. Also, if you use in conjunction with an actual database you may find it takes longer to process or compile. I can tell you straight off there is never a need to use dataset options in SQL. So this question is really mute. The only time I have seen it used usefully is when people use select *, which in itself is not recommended. You should know what data you want to pull from the table{s}, select * is both sloppy lazy programming, and could cause issues with your code.
So simply put, there is no benefit to using these, its not portable, and sloppy coding should be avoided anyways.
Functionally they may be the same, however I would recommend not using dataset options in SQL. There are various reasons, the two are separate languages, so using one in the other would confuse those people familiar with SQL. Also, the code no longer becomes portable - i.e. you cant copy th code out to a pure SQL processor. Also, if you use in conjunction with an actual database you may find it takes longer to process or compile. I can tell you straight off there is never a need to use dataset options in SQL. So this question is really mute. The only time I have seen it used usefully is when people use select *, which in itself is not recommended. You should know what data you want to pull from the table{s}, select * is both sloppy lazy programming, and could cause issues with your code.
So simply put, there is no benefit to using these, its not portable, and sloppy coding should be avoided anyways.
Thanks for the answer, I never used myself (didn't even know it was possible) but then I ended up maintaining some code that used this WHERE as a data option and asked myself if it was better than the other way. It is not apparently, good to know. Thanks again.
I need @RW9 input whether my understanding is correct that I have written down below.
PROC SQL;
CREATE TABLE sql_where AS
SELECT *
FROM essentia.olympic
WHERE year = 2012;
QUIT;
Above step would restrict rows in PDV. Only those row would come in PDV that has year=2012.
PROC SQL;
CREATE TABLE sql_where (WHERE= (year=2012)) AS
SELECT *
FROM essentia.olympic;
QUIT;
The second approach would read every row in PDV irrespective of year =2012 and later on it will restrict while writing output dataset. Therefore this approach would definately take much longer time
That sounds about right to me, its a bit like putting dataset options on the set statment rather than on the data statement:
E.g.
data want;
set have (keep= a b c);
if d=1 ... this will fail as d is not read into PDV for processing
-
data want (keep=a b c);
set have;
if d=1 ... this will not fail as all variables are read into the PDV and only a b c are written out to the dataset.
Do note, I haven't tested or anything, just my best guess
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.