BookmarkSubscribeRSS Feed
aasdfafafsdfsaf
Calcite | Level 5

Hi everyone,

 

Im working with large datasets on a Oracle server and i was wondering what was the most efficient way of finding observations in one dataset [A] (100m rows,20columns) in another dataset column [B] (3m rows)

 

the code that i have right now is extremely slow and puts a heavy strain on the server, not to mention the file size limitation.

here is the code i have, that i want to make faster

 

proc sql;

create table want_table

as select * from table_a where

table_a_col in (select col_in_table_b);

quit;

 

sorry if it's vague. I'll try to be as clear as possible in the comments.

7 REPLIES 7
Reeza
Super User
Are both these tables on the Oracle Server or is one local?

And is your Oracle server indexed on table_a_col?

That query looks incomplete so showing actual code is likely useful. If you're using tables from different systems, ie Oracle and SAS or different Oracle DB then this will be slow.
aasdfafafsdfsaf
Calcite | Level 5

both are on Oracle and it is not indexed on the col_a.

there are many different columns that i would have to look for using this where col_a in (select col_b from table_b) so i dont know if can reindex on multiple columns.

sorry for the wrong code. still relatively new to SAS

ChrisHemedinger
Community Manager

If these are in fact two different databases, then SAS will be forced to move eligible records from each into the SAS session in order to complete the JOIN.

 

If these are two tables from the same Oracle database, or same server but different schemas, then might be better to use PROC SQL CONNECT to use explicit passthrough -- sending the commands directly to the database for processing -- rather than allowing SAS PROC SQL to try and figure out how to satisfy the query.

 

If that's not possible, then it might be better to bring the smaller table into SAS (or the subset you are likely to need), and upload it to a scratch area on the other database, then join those to get the final result.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
aasdfafafsdfsaf
Calcite | Level 5

they will be both in the same oracle server or one local(3m) and one on the oracle server(100M). is there anything specific i can google to get a more precise search result for this issue? thanks!

ChrisHemedinger
Community Manager

If the data is in the same Oracle server, then explicit passthrough will ensure all work is done in the database and will minimize the I/O back to the SAS session.

 

If you have one table local and one table remote, then that's a recipe for a slow query.  SAS is forced to bring ALL records into the SAS session to compare/filter for the join.  If you can upload the smaller table (or a subset) to the database then use PROC SQL to join there, that's best. If the join is on a small subset of key values, you might even be able to create a nested query filter (using the IN operator) with those literal values to subset the larger remote table. You would first build those literal values with a PROC SQL on the local table and SELECT INTO a macro variable, then use that as part of the SQL you send to the remote database.

Check out SAS Innovate on-demand content! Watch the main stage sessions, keynotes, and over 20 technical breakout sessions!
aasdfafafsdfsaf
Calcite | Level 5
Thank you!
I'll try to implement this first and get back
ChrisNZ
Tourmaline | Level 20

Also  in (select col_in_table_b) can be a lot more expensive than an inner join if the parser does not convert to a join for you, as the whole table B is read for each row in table A.

Submit a query with an inner join.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 720 views
  • 2 likes
  • 4 in conversation