Hi,
I have a huge table (Main) which contains millions of records and so many variables like Account,Token and Trans_Type etc. There is an another table (Lookup ) which contains only two variables - Account and Tokens.
Now I have to make a final table which will be based on below condition-
1) Corresponding Token from Lookup where substr(Account,15,1)= "2" or Corresponding Token from Main table where Trans_Type = "O".
I wrote this code -
proc sql;
create table Final as
select a.*, b.*
from Main as a Lookup as b
where a.token = b.token | a.Trans_Type = "O" ;
quit;
Actually I need to join the table because going forward I have to remove the account number column and the corresponding Token values from Lookup Table will be added in the Token Column in Main table.
That's why I am first putting Where substr(Account,15,1)="2" in the Lookup table to get the corresponding Token from the Lookup table.
Now I have to join both the table where all the Tokens from Lookup table or those token where Trans_Type ="O" in the Main table. So this will like Full Join not left join because of OR condition. So if we execute the full join on huge table size will be increased. So need to search the alternate method. Please let me know any other way except Hash Join. Because demerit with Hash join is it takes lot of memory.
As Lookup table is index created on both the column and Main table is also indexed on Account, Token, Trans_Type because Main table is huge table. So this query eventually taking 5-6 hrs of time and also taking the huge memory. I need to reduce the time while extracting the data.
I'm not an SQL expert, but does this program give you the right result (no matter how long it takes to run)? Doesn't this join every Trans_Type="O" record with every record from the lookup table?
1. You say you need a join but one isn't specified, so by default its a cross join which is inefficient.
2. You show a restriction that doesn't appear in the code: substr(Account,15,1)= "2"
3. the tables are not separated by a , so the code included would probably return an error (not 100% sure)
4. where are your tables? is the main in a server and lookup on your computer or are both in a server or your work library?
My guess is that you are trying to join the tables on the account field and then set up some decision rules based on values of other fields. If my understanding of your need is correct, then I think this or something like this is what you want:
proc sql;
create table Final as
select
a.*,
b.token as lookup_token,
case
when substr(Account,15,1)= "2" then b.token
when Trans_Type = "O" then a.token
else .
end as final_token
from
Main as a,
Lookup as b
where
a.account = b.account;
quit;
Regarding the efficiency aspect of your question, I work with data sets with hundreds of millions of records. For me, PROC SQL is much more time efficient than DATA step programming. I can usually cut through a file in a few minutes. One trick I apply is that I will use a keep= dataset option to limit the fields available on input. So, in the above example, 'd have "from Main (keep=account trans_type token) as a, Lookup as b". This speeds processing tremendously.
I hope that this note or my other comment helps.
Are these SAS tables or what is the database. If it's SAS tables then a simple data step with a hash object (containing the lookup table) could be very efficient. How big is the lookup table (only needed rows and columns)?
I agree with tish that your SQL and your narrative don't add up.
Your query will not return the desired result because every observation in Lookup will match every record in Main where Trans_type="O". You can avoid the inefficient OR condition with a UNION query such as :
proc sql;
create table Final as
select A.*, B.account
from Main as A inner join Lookup as B on A.token = B.token
where A.Trans_Type ne "O"
UNION ALL
select A.*, B.account
from Main as A left join Lookup as B on A.token = B.token
where A.Trans_Type = "O";
quit;
This will pull out every Main record that matches a Lookup observation plus every Main record that has Trans_Type="O". That seems to be what you want.
PG
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.