BookmarkSubscribeRSS Feed
SudhirU
Calcite | Level 5

Hi,

I have a huge table (Main) which contains millions of records and so many variables like Account,Token and Trans_Type etc. There is an another table (Lookup ) which contains only two variables - Account and Tokens.

Now I have to make a final table which will be based on below condition-

1) Corresponding Token from Lookup where substr(Account,15,1)= "2" or Corresponding Token from Main table where Trans_Type = "O".

I wrote this code -

proc sql;

create table Final as

select a.*, b.*

from Main as a Lookup as b

where a.token = b.token | a.Trans_Type = "O" ;

quit;

Actually I need to join the table because going forward I have to remove the account number column and the corresponding Token values from Lookup Table will be added in the Token Column in Main table.

That's why I am first putting Where substr(Account,15,1)="2" in the Lookup table to get the corresponding Token from the Lookup table.

Now I have to join both the table where all the Tokens from Lookup table or those token where Trans_Type ="O" in the Main table. So this will like Full Join not left join because of OR condition. So if we execute the full join on huge table size will be increased. So need to search the alternate method. Please let me know any other way except Hash Join. Because demerit with Hash join is it takes lot of memory.

As Lookup table is index created on both the column and Main table is also indexed on Account, Token, Trans_Type because Main table is huge table. So this query eventually taking 5-6 hrs of time and also taking the huge memory. I need to reduce the time while extracting the data.

6 REPLIES 6
Astounding
PROC Star

I'm not an SQL expert, but does this program give you the right result (no matter how long it takes to run)?  Doesn't this join every Trans_Type="O" record with every record from the lookup table?

Reeza
Super User

1. You say you need a join but one isn't specified, so by default its a cross join which is inefficient.

2. You show a restriction that doesn't appear in the code: substr(Account,15,1)= "2"

3. the tables are not separated by a , so the code included would probably return an error (not 100% sure)

4. where are your tables? is the main in a server and lookup on your computer or are both in a server or your work library?

tish
Calcite | Level 5

My guess is that you are trying to join the tables on the account field and then set up some decision rules based on values of other fields. If my understanding of your need is correct, then I think this or something like this is what you want:

proc sql;

   create table Final as

      select

         a.*,

         b.token as lookup_token,

         case

               when substr(Account,15,1)= "2" then b.token

               when Trans_Type = "O" then a.token

               else .

            end as final_token

      from

         Main as a,

         Lookup as b

      where

         a.account = b.account;

quit;

tish
Calcite | Level 5

Regarding the efficiency aspect of your question, I work with data sets with hundreds of millions of records. For me, PROC SQL is much more time efficient than DATA step programming. I can usually cut through a file in a few minutes. One trick I apply is that I will use a keep= dataset option to limit the fields available on input. So, in the above example, 'd have "from Main (keep=account trans_type token) as a, Lookup as b". This speeds processing tremendously.

I hope that this note or my other comment helps.

Patrick
Opal | Level 21

Are these SAS tables or what is the database. If it's SAS tables then a simple data step with a hash object   (containing the lookup table) could be very efficient. How big is the lookup table (only needed rows and columns)?

I agree with tish that your SQL and your narrative don't add up.

PGStats
Opal | Level 21

Your query will not return the desired result because every observation in Lookup will match every record in Main where Trans_type="O". You can avoid the inefficient OR condition with a UNION query such as :

proc sql;

create table Final as

     select A.*, B.account

     from Main as A inner join Lookup as B on A.token = B.token

     where A.Trans_Type ne "O"

UNION ALL

     select A.*, B.account

     from Main as A left join Lookup as B on A.token = B.token

     where A.Trans_Type = "O";

quit;

This will pull out every Main record that matches a Lookup observation plus every Main record that has Trans_Type="O". That seems to be what you want.

PG


PG

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 10679 views
  • 0 likes
  • 6 in conversation