Hi all I am a fairly experienced SAS programmer being given my first tastes of SAS VIYA and trying to get my head around it by having a play. can anyone explain to me why this works:
proc sql;
create table work.test_&monthname. as
select asatdate,IFRS9Stage, sum(Current_Balance) as balance
from public.ANON_ECL_&monthname.(where=(acct_scenario2=99))
group by 1, 2
;
quit;
and this dosent:
proc fedsql;
create table PUBLIC.test_&monthname. as
select asatdate,IFRS9Stage, sum(Current_Balance) as balance
from public.ANON_ECL_&monthname.(where=(acct_scenario2=99))
group by 1, 2
;
quit;
the error I receive is "ERROR: Syntax error at or near "where"" and I have spent 30 mins googling to no avail... help me SAS community, you are my only hope!
I use this syntax having moved the where clause from the right table parenthesis to the last 'on' statement.
Perhaps this helps.
proc cas;
source MPG_toyota;
create table MKT.FACT_OPE_FORM_OK_NODUP2{options replace=true} as
select a.*, b._frequency_
from MKT.FACT_OPE_FORM_OK a inner join MKT.FACT_OPE_FORM_OK_NODUP b
on a.codopera=b.codopera and a.codidoc=b.codidoc
and put(a._fecformo, ddmmyy10.)=put(b._fecformo, ddmmyy10.)
and b._frequency_ >= 2 ;
endsource;
fedSQL.execDirect / query=MPG_toyota;
quit;
Does not look like PROC FEDSQL supports dataset options.
269 proc fedsql ; 270 create table x as 271 select * 272 from sashelp.class(where=(sex='M')) 273 ; ERROR: Syntax error at or near "where" 274 quit; NOTE: PROCEDURE FEDSQL used (Total process time): real time 0.05 seconds cpu time 0.06 seconds 275 proc fedsql ; 276 create table x as 277 select * 278 from sashelp.class(drop=age) 279 ; ERROR: Syntax error at or near "drop" 280 quit; NOTE: PROCEDURE FEDSQL used (Total process time): real time 0.04 seconds cpu time 0.04 seconds
feels like a big oversight given the fact that CAS is all about speeding up processing, FEDSQL has to be used on CAS and dataset options are ususally used like this to limit input datasets and improve processing performance.
thanks for checking
I think that FEDSQL is actually something that SAS bought from someone. So it did not really know SAS syntax. Apparently they have not enhanced it to support SAS syntax. Perhaps that is a selling point as the code is then more likely to be portable (and can be pushed into a remote database).
I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.
@SASKiwi wrote:
I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.
I am totally ignorant on this topic. But are you convinced that the WHERE clause will always be applied at data input time, just as a WHERE dataset option would be?
I mean this. Let's say filtervariable exists only on dataset B.
If you run
proc sql;
create table new
as select *
from a
left join b (where=(filtervariable=filtervalue))
on a.id=b.id
;
quit;
the where clause is implemented at the input stage, so ineligible observations from b are not exposed to the join operation.
But is the where filter also implemented at the input stage (vs after joining) in this?
proc sql;
create table new
as select *
from a
left join b
on a.id=b.id
where filtervariable=filtervalue
;
quit;
I would like to think so, but have never seen any documentation to that effect.
@mkeintz wrote:
@SASKiwi wrote:
I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.
I am totally ignorant on this topic. But are you convinced that the WHERE clause will always be applied at data input time, just as a WHERE dataset option would be?
I mean this. Let's say filtervariable exists only on dataset B.
If you run
proc sql; create table new as select * from a left join b (where=(filtervariable=filtervalue)) on a.id=b.id ; quit;
the where clause is implemented at the input stage, so ineligible observations from b are not exposed to the join operation.
But is the where filter also implemented at the input stage (vs after joining) in this?
proc sql; create table new as select * from a left join b on a.id=b.id where filtervariable=filtervalue ; quit;
I would like to think so, but have never seen any documentation to that effect.
Your second example is going to give you a different result set as you are overriding the LEFT JOIN with a WHERE clause on B which will only allow rows obeying the filter to be kept.
Regarding the first example I would write it like this:
proc sql;
create table new
as select *
from a
left join
(select *
from b
where filtervariable=filtervalue
) b
on a.id=b.id
;
quit;
hey bud, it does process the data very differently actually, its much more efficient to drop all unecessary variables at compile stage than during the processing as the data will be read every time during processing
just wanted to give a quick example using a relatively small dataset:
NOTE: The data set WORK.TEMP has 453310 observations and 157 variables.
28 proc sql;
28 ! create table temp2 as select
29 *
30 from temp(where=(current_balance>5000));
NOTE: Table WORK.TEMP2 created, with 419030 rows and 157 columns.
31 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2.39 seconds
cpu time 2.39 seconds
32
33 proc sql;
33 ! create table temp2 as select
34 *
35 from temp
36 where current_balance>5000;
NOTE: Table WORK.TEMP2 created, with 419030 rows and 157 columns.
37 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 3.20 seconds
cpu time 2.54 seconds
as you can see above the real time processing is almost a second quicker when processing at compile time and if you scale that up for big data, multiple users, multiple data sets it quickly becomes a big difference
I use this syntax having moved the where clause from the right table parenthesis to the last 'on' statement.
Perhaps this helps.
proc cas;
source MPG_toyota;
create table MKT.FACT_OPE_FORM_OK_NODUP2{options replace=true} as
select a.*, b._frequency_
from MKT.FACT_OPE_FORM_OK a inner join MKT.FACT_OPE_FORM_OK_NODUP b
on a.codopera=b.codopera and a.codidoc=b.codidoc
and put(a._fecformo, ddmmyy10.)=put(b._fecformo, ddmmyy10.)
and b._frequency_ >= 2 ;
endsource;
fedSQL.execDirect / query=MPG_toyota;
quit;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.