Solved: super simple proc sql vs. proc fedsql question

stewart_Jardine · Posted 10-17-2022 10:53 AM

Hi all I am a fairly experienced SAS programmer being given my first tastes of SAS VIYA and trying to get my head around it by having a play. can anyone explain to me why this works:

proc sql;
create table work.test_&monthname. as 
select asatdate,IFRS9Stage, sum(Current_Balance) as balance 
from public.ANON_ECL_&monthname.(where=(acct_scenario2=99)) 
group by 1, 2 
; 
quit;

and this dosent:

proc fedsql; 
create table PUBLIC.test_&monthname. as 
select asatdate,IFRS9Stage, sum(Current_Balance) as balance 
from public.ANON_ECL_&monthname.(where=(acct_scenario2=99)) 
group by 1, 2 
; 
quit;

the error I receive is "ERROR: Syntax error at or near "where"" and I have spent 30 mins googling to no avail... help me SAS community, you are my only hope!

acordes · Posted 10-17-2022 06:02 PM

I use this syntax having moved the where clause from the right table parenthesis to the last 'on' statement.

Perhaps this helps.

proc cas;
source MPG_toyota;
    create table MKT.FACT_OPE_FORM_OK_NODUP2{options replace=true} as 
      select a.*, b._frequency_
          from MKT.FACT_OPE_FORM_OK a inner join MKT.FACT_OPE_FORM_OK_NODUP b
      on a.codopera=b.codopera and a.codidoc=b.codidoc 
      and put(a._fecformo, ddmmyy10.)=put(b._fecformo, ddmmyy10.) 
      and b._frequency_ >= 2 ;
endsource;
fedSQL.execDirect / query=MPG_toyota;
quit;

View solution in original post

Tom · Posted 10-17-2022 11:02 AM

Does not look like PROC FEDSQL supports dataset options.

269  proc fedsql ;
270  create table x as
271  select *
272  from sashelp.class(where=(sex='M'))
273  ;
ERROR: Syntax error at or near "where"
274  quit;

NOTE: PROCEDURE FEDSQL used (Total process time):
      real time           0.05 seconds
      cpu time            0.06 seconds


275  proc fedsql ;
276  create table x as
277  select *
278  from sashelp.class(drop=age)
279  ;
ERROR: Syntax error at or near "drop"
280  quit;

NOTE: PROCEDURE FEDSQL used (Total process time):
      real time           0.04 seconds
      cpu time            0.04 seconds

stewart_Jardine · Posted 10-17-2022 11:05 AM

feels like a big oversight given the fact that CAS is all about speeding up processing, FEDSQL has to be used on CAS and dataset options are ususally used like this to limit input datasets and improve processing performance.

thanks for checking

Tom · Posted 10-17-2022 11:14 AM

I think that FEDSQL is actually something that SAS bought from someone. So it did not really know SAS syntax. Apparently they have not enhanced it to support SAS syntax. Perhaps that is a selling point as the code is then more likely to be portable (and can be pushed into a remote database).

SASKiwi · Posted 10-17-2022 03:59 PM

I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.

mkeintz · Posted 10-17-2022 04:17 PM

@SASKiwi wrote:

I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.

I am totally ignorant on this topic. But are you convinced that the WHERE clause will always be applied at data input time, just as a WHERE dataset option would be?

I mean this. Let's say filtervariable exists only on dataset B.

If you run

proc sql;
  create table new
  as select * 
  from a 
  left join b (where=(filtervariable=filtervalue))
  on a.id=b.id
  ;
quit;

the where clause is implemented at the input stage, so ineligible observations from b are not exposed to the join operation.

But is the where filter also implemented at the input stage (vs after joining) in this?

proc sql;
  create table new
  as select * 
  from a 
  left join b 
  on a.id=b.id
  where filtervariable=filtervalue
  ;
quit;

I would like to think so, but have never seen any documentation to that effect.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SASKiwi · Posted 10-17-2022 04:36 PM

@mkeintz wrote:

@SASKiwi wrote:

I would be surprised if a WHERE = dataset option performs any differently to a WHERE clause. To me dataset options are more applicable to DATA steps and I prefer to avoid these in SQL unless there is a good reason for it.

I am totally ignorant on this topic. But are you convinced that the WHERE clause will always be applied at data input time, just as a WHERE dataset option would be?

I mean this. Let's say filtervariable exists only on dataset B.

If you run
proc sql;
  create table new
  as select * 
  from a 
  left join b (where=(filtervariable=filtervalue))
  on a.id=b.id
  ;
quit;
the where clause is implemented at the input stage, so ineligible observations from b are not exposed to the join operation.

But is the where filter also implemented at the input stage (vs after joining) in this?
proc sql;
  create table new
  as select * 
  from a 
  left join b 
  on a.id=b.id
  where filtervariable=filtervalue
  ;
quit;
I would like to think so, but have never seen any documentation to that effect.

Your second example is going to give you a different result set as you are overriding the LEFT JOIN with a WHERE clause on B which will only allow rows obeying the filter to be kept.

Regarding the first example I would write it like this:

proc sql;
  create table new
  as select * 
  from a 
  left join 
  (select *
   from b 
   where filtervariable=filtervalue
 ) b
  on a.id=b.id
  ;
quit;

stewart_Jardine · Posted 10-18-2022 02:57 AM

hey bud, it does process the data very differently actually, its much more efficient to drop all unecessary variables at compile stage than during the processing as the data will be read every time during processing

stewart_Jardine · Posted 10-18-2022 03:28 AM

just wanted to give a quick example using a relatively small dataset:

NOTE: The data set WORK.TEMP has 453310 observations and 157 variables.

28 proc sql;
28 ! create table temp2 as select
29 *
30 from temp(where=(current_balance>5000));

NOTE: Table WORK.TEMP2 created, with 419030 rows and 157 columns.

31 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 2.39 seconds
cpu time 2.39 seconds

32
33 proc sql;
33 ! create table temp2 as select
34 *
35 from temp
36 where current_balance>5000;
NOTE: Table WORK.TEMP2 created, with 419030 rows and 157 columns.

37 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 3.20 seconds
cpu time 2.54 seconds

as you can see above the real time processing is almost a second quicker when processing at compile time and if you scale that up for big data, multiple users, multiple data sets it quickly becomes a big difference

acordes · Posted 10-17-2022 06:02 PM

I use this syntax having moved the where clause from the right table parenthesis to the last 'on' statement.

Perhaps this helps.

proc cas;
source MPG_toyota;
    create table MKT.FACT_OPE_FORM_OK_NODUP2{options replace=true} as 
      select a.*, b._frequency_
          from MKT.FACT_OPE_FORM_OK a inner join MKT.FACT_OPE_FORM_OK_NODUP b
      on a.codopera=b.codopera and a.codidoc=b.codidoc 
      and put(a._fecformo, ddmmyy10.)=put(b._fecformo, ddmmyy10.) 
      and b._frequency_ >= 2 ;
endsource;
fedSQL.execDirect / query=MPG_toyota;
quit;

super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

Re: super simple proc sql vs. proc fedsql question

SAS Innovate 2025: Call for Content

Classroom Training Available!