Solved: SQL select statement

Kiteulf · Posted 09-09-2021 04:52 AM

I want to select in proc sql with a where statement from a list

from							test.account(where =(delete_ind = 'N'
														and current_record_ind ='Y'
														and appid in(select id from test2)))

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, -.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR: Syntax error while parsing WHERE clause.
ERROR: Invalid value for the WHERE option.

How should I do this?

Patrick · Posted 09-09-2021 07:22 AM

@Kiteulf What about an inner join?

from test.account r inner join test2 l
on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

or implicit:

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

View solution in original post

PeterClemmensen · Posted 09-09-2021 04:59 AM

You can't do that in the Data Set Options directly.

See the difference in the two PROC SQL steps below. Should give you the answer.

/* Test data */
data one;
input id x;
datalines;
1 10
2 20
3 30
;

data two;
input id;
datalines;
1
3
;

/* Error */
proc sql;
   create table want as 
   select one.*
   from one(where = (id in (select id from two)));
quit;

/* All good */
proc sql;
   create table want as
   select one.*
   from one
   where id in (select id from two);
quit;

The DATA to DATA Step Macro
Blog: SASnrd

Kiteulf · Posted 09-09-2021 05:08 AM

So then I will need to do;

from							test.account

where                                              delete_ind = 'N'
and                                                  current_record_ind ='Y'
and                                                  appid in(select id from test2)

The problem then is that the query bogs down. At the moment I have only 3 records from test2 but there will be more later.

If I am doing it manually and

from							test.account (where =(delete_ind = 'N'
														and current_record_ind ='Y'
														and appid in(3819532,3809511,3812779))) as t0

it is super fast

Patrick · Posted 09-09-2021 07:22 AM

@Kiteulf What about an inner join?

from test.account r inner join test2 l
on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

or implicit:

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

Kiteulf · Posted 09-09-2021 08:08 AM

It sort of works.

It is not painstakingly slow but also not fast. Valid input! I forgot about that possibility.

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

Patrick · Posted 09-09-2021 08:19 AM

@Kiteulf Depending on the number of rows in your tables below eventually performs much better.

data want;
  set test.account(where=(delete_ind='N' and current_record_ind='Y'));
  if _n_=1 then
    do;
      if 0 then set test2(keep=id rename=(id=appid));
      dcl hash h1 (dataset:'test2(keep=id rename=(id=appid))');
      h1.defineKey('appid')
      h1.defineDone();
    end;
  if h1.check()=0;
run;

Kiteulf · Posted 09-09-2021 08:30 AM

I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...

Patrick · Posted 09-10-2021 06:47 AM

I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...

That's certainly not an insignificant detail. You could investigate if appid and/or id are indexed and then eventually have this condition first in your where clause. Not sure how smart the SQL Server compiler is so these things are always worth a try.

r.appid=l.id

If performance is really that important to you then you could also investigate or talk to a DBA if there are any SQL Server side tweaks possible - like writing code that executes in parallel. You then would use Postgres SQL code and pass it to the DB from SAS via explicit SQL.

Also make sure that you don't use a select * but list explicitly the columns you really need. Sending the data from the DB to the SAS Server is often the bottleneck.

A few other things that also can improve performance significantly - docu here:

DIRECT_SQL=Yes

READBUFF set to a value suitable for the environment

And last but not least: I assume BOTH your tables are in the DB. If not then please tell us as this will be highly likely the reason for bad performance - and there are ways how to code differently to improve things.

SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Re: SQL select statement

Registration is open

SAS Training: Just a Click Away