BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiteulf
Quartz | Level 8

 

 

I want to select in proc sql with a where statement from a list

 

from							test.account(where =(delete_ind = 'N'
														and current_record_ind ='Y'
														and appid in(select id from test2)))

ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant,
a missing value, -.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR: Syntax error while parsing WHERE clause.
ERROR: Invalid value for the WHERE option.

 

How should I do this?

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@Kiteulf  What about an inner join?

from test.account r inner join test2 l
on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

or implicit:

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

 

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

You can't do that in the Data Set Options directly. 

 

See the difference in the two PROC SQL steps below. Should give you the answer.

 

/* Test data */
data one;
input id x;
datalines;
1 10
2 20
3 30
;

data two;
input id;
datalines;
1
3
;

/* Error */
proc sql;
   create table want as 
   select one.*
   from one(where = (id in (select id from two)));
quit;

/* All good */
proc sql;
   create table want as
   select one.*
   from one
   where id in (select id from two);
quit;
Kiteulf
Quartz | Level 8

So then I will need to do;

 

from							test.account

where                                              delete_ind = 'N'
and                                                  current_record_ind ='Y'
and                                                  appid in(select id from test2)

The problem then is that the query bogs down. At the moment I have only 3 records from test2 but there will be more later.

If I am doing it manually and 

from							test.account (where =(delete_ind = 'N'
														and current_record_ind ='Y'
														and appid in(3819532,3809511,3812779))) as t0

it is super fast

 

 

 

 

 

Patrick
Opal | Level 21

@Kiteulf  What about an inner join?

from test.account r inner join test2 l
on r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

or implicit:

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id

 

Kiteulf
Quartz | Level 8

It sort of works.

 

It is not painstakingly slow but also not fast. Valid input! I forgot about that possibility.

 

from test.account r,test2 l
where r.delete_ind='N' and r.current_record_ind='Y' and r.appid=l.id
Patrick
Opal | Level 21

@Kiteulf Depending on the number of rows in your tables below eventually performs much better.

data want;
  set test.account(where=(delete_ind='N' and current_record_ind='Y'));
  if _n_=1 then
    do;
      if 0 then set test2(keep=id rename=(id=appid));
      dcl hash h1 (dataset:'test2(keep=id rename=(id=appid))');
      h1.defineKey('appid')
      h1.defineDone();
    end;
  if h1.check()=0;
run; 

 

Kiteulf
Quartz | Level 8

 

 

I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...

Patrick
Opal | Level 21

I forgot to mention that this data is on a SQL server. So the hashing will not work very well towards that right...


That's certainly not an insignificant detail. You could investigate if appid and/or id are indexed and then eventually have this condition first in your where clause. Not sure how smart the SQL Server compiler is so these things are always worth a try. 

r.appid=l.id

 

If performance is really that important to you then you could also investigate or talk to a DBA if there are any SQL Server side tweaks possible - like writing code that executes in parallel. You then would use Postgres SQL code and pass it to the DB from SAS via explicit SQL.

 

Also make sure that you don't use a select * but list explicitly the columns you really need. Sending the data from the DB to the SAS Server is often the bottleneck.

 

A few other things that also can improve performance significantly - docu here:

DIRECT_SQL=Yes

READBUFF set to a value suitable for the environment

 

And last but not least: I assume BOTH your tables are in the DB. If not then please tell us as this will be highly likely the reason for bad performance - and there are ways how to code differently to improve things.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1773 views
  • 0 likes
  • 3 in conversation