Help using Base SAS procedures

Proc SQL is not efficient with OR operator in where clause?

Reply
N/A
Posts: 1

Proc SQL is not efficient with OR operator in where clause?

Hi,

I have a huge table (Main) which contains millions of records and so many variables like Account,Token and Trans_Type etc. There is an another table (Lookup ) which contains only two variables - Account and Tokens.

Now I have to make a final table which will be based on below condition-

1) Corresponding Token from Lookup where substr(Account,15,1)= "2" or Corresponding Token from Main table where Trans_Type = "O".

I wrote this code -

proc sql;

create table Final as

select a.*, b.*

from Main as a Lookup as b

where a.token = b.token | a.Trans_Type = "O" ;

quit;

Actually I need to join the table because going forward I have to remove the account number column and the corresponding Token values from Lookup Table will be added in the Token Column in Main table.

That's why I am first putting Where substr(Account,15,1)="2" in the Lookup table to get the corresponding Token from the Lookup table.

Now I have to join both the table where all the Tokens from Lookup table or those token where Trans_Type ="O" in the Main table. So this will like Full Join not left join because of OR condition. So if we execute the full join on huge table size will be increased. So need to search the alternate method. Please let me know any other way except Hash Join. Because demerit with Hash join is it takes lot of memory.

As Lookup table is index created on both the column and Main table is also indexed on Account, Token, Trans_Type because Main table is huge table. So this query eventually taking 5-6 hrs of time and also taking the huge memory. I need to reduce the time while extracting the data.

Super User
Posts: 5,516

Re: Proc SQL is not efficient with OR operator in where clause?

I'm not an SQL expert, but does this program give you the right result (no matter how long it takes to run)?  Doesn't this join every Trans_Type="O" record with every record from the lookup table?

Super User
Posts: 19,855

Re: Proc SQL is not efficient with OR operator in where clause?

1. You say you need a join but one isn't specified, so by default its a cross join which is inefficient.

2. You show a restriction that doesn't appear in the code: substr(Account,15,1)= "2"

3. the tables are not separated by a , so the code included would probably return an error (not 100% sure)

4. where are your tables? is the main in a server and lookup on your computer or are both in a server or your work library?

Contributor
Posts: 52

Re: Proc SQL is not efficient with OR operator in where clause?

My guess is that you are trying to join the tables on the account field and then set up some decision rules based on values of other fields. If my understanding of your need is correct, then I think this or something like this is what you want:

proc sql;

   create table Final as

      select

         a.*,

         b.token as lookup_token,

         case

               when substr(Account,15,1)= "2" then b.token

               when Trans_Type = "O" then a.token

               else .

            end as final_token

      from

         Main as a,

         Lookup as b

      where

         a.account = b.account;

quit;

Contributor
Posts: 52

Re: Proc SQL is not efficient with OR operator in where clause?

Regarding the efficiency aspect of your question, I work with data sets with hundreds of millions of records. For me, PROC SQL is much more time efficient than DATA step programming. I can usually cut through a file in a few minutes. One trick I apply is that I will use a keep= dataset option to limit the fields available on input. So, in the above example, 'd have "from Main (keep=account trans_type token) as a, Lookup as b". This speeds processing tremendously.

I hope that this note or my other comment helps.

Respected Advisor
Posts: 4,173

Re: Proc SQL is not efficient with OR operator in where clause?

Are these SAS tables or what is the database. If it's SAS tables then a simple data step with a hash object   (containing the lookup table) could be very efficient. How big is the lookup table (only needed rows and columns)?

I agree with tish that your SQL and your narrative don't add up.

Respected Advisor
Posts: 4,930

Re: Proc SQL is not efficient with OR operator in where clause?

Your query will not return the desired result because every observation in Lookup will match every record in Main where Trans_type="O". You can avoid the inefficient OR condition with a UNION query such as :

proc sql;

create table Final as

     select A.*, B.account

     from Main as A inner join Lookup as B on A.token = B.token

     where A.Trans_Type ne "O"

UNION ALL

     select A.*, B.account

     from Main as A left join Lookup as B on A.token = B.token

     where A.Trans_Type = "O";

quit;

This will pull out every Main record that matches a Lookup observation plus every Main record that has Trans_Type="O". That seems to be what you want.

PG


PG
Ask a Question
Discussion stats
  • 6 replies
  • 358 views
  • 0 likes
  • 6 in conversation