Advice on tuning a join/lookup against a very large PostgreSQL table?

EinarRoed · Posted 10-26-2021 02:51 AM

I want to identify all rows in the table SOURCE which don't exist in the table TARGET. This is easily done via this simple code:

proc sql;
	create table NEW as
	select src.primkey from SOURCE src
	left join TARGET tgt on (src.primkey = tgt.primkey)
	where tgt.primkey is missing;
quit;

SOURCE is a SAS-table that contains around 3 million rows.

TARGET is a PostgreSQL-table that contains around 750 million rows. It has an index on primkey.

Do you know if there's a more efficient way to perform this operation? What's the most optimized way to check if a source row exists in a very large target table?

Kurt_Bremser · Posted 10-26-2021 03:33 AM

Usually the best way is to upload the SAS table to a temporary table in the database and do the join there with explicit pass through. This minimizes network traffic.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Advice on tuning a join/lookup against a very large PostgreSQL table?

Re: Advice on tuning a join/lookup against a very large PostgreSQL table?

Catch up on SAS Innovate 2026

Advice on tuning a join/lookup against a very large PostgreSQL table?

Re: Advice on tuning a join/lookup against a very large PostgreSQL table?

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away