I want to identify all rows in the table SOURCE which don't exist in the table TARGET. This is easily done via this simple code:
proc sql;
create table NEW as
select src.primkey from SOURCE src
left join TARGET tgt on (src.primkey = tgt.primkey)
where tgt.primkey is missing;
quit;
SOURCE is a SAS-table that contains around 3 million rows.
TARGET is a PostgreSQL-table that contains around 750 million rows. It has an index on primkey.
Do you know if there's a more efficient way to perform this operation? What's the most optimized way to check if a source row exists in a very large target table?
Usually the best way is to upload the SAS table to a temporary table in the database and do the join there with explicit pass through. This minimizes network traffic.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.