It has worked in the past and now suddenly my code is broken. I'm performing a join between ~20 tables with around 1-50k rows each. All tables have an index and are sorted. The join is performed using PROC SQL. It's requiring hundreds of GB's before taking up all available resources and throwing an error once everything is exhausted. I'm trying to see if there's a cartesian product being mistakenly created and there are none. What are my options? Must I refactor this to be done with PROC SORT's and DATA steps?
Did you check some of the basic performance considerations for your query like WHERE clause, KEEP or DROP, indexed datasets.
How would I go about doing that? For this query there is no WHERE clause. And isn't KEEP and DROP specific to the DATA step and not PROC SQL?
You can use KEEP or DROP or SELECT Clause in your query for keeping only the columns that you need.
Keep drop can also be used in proc sql
proc sql;
select *
from sashelp.class(keep= Name Sex);
quit;
Also another consideration is avoid heterogeneous joins if possible. Where are your tables located? and share your sample query.
proc sql;
create table report as
select
a.id,
a.col1,
a.col2,
b.col4
case when
c.col5 is not null then 1
end as col5
...
z.col30
from
a
left join b on a.id = b.id
...
left join z on a.id = z.id;
quit;
The query looks something like that. Some tables are in work and some are in other libraries.
If you do not have a many-to-many relationship and therefore do not need to build a cartesian join, switch to sorting and merging in a data step.
OK, I figured it out, this was my mistake. It was forming a cartesian product because I forgot to throw out rows with a NULL variable that it was being joined on. Thanks for the help everyone.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.