I would be strongly tempted to replace code like
left join hmaodm.actv_fact af on cdf.case_id = af.case_id and af.cse_src_sys_cd in ('CPM')
with something like
left join (select * from hmaodm.actv_fact where cse_src_sys_cd in ('CPM')) af on cdf.case_id = af.case_id
to reduce the number of records brought into the join. You have opportunities for this at many of your joins
Doesn't the SQL parser subset the right table when it sees
and af.cse_src_sys_cd in ('CPM')
?
Both SQL steps take 4.8s on my machine.
data A(sortedby=I) B(sortedby=I);
do I=1 to 1e7;
output;
end;
run;
proc sql _method;
create table T as
select A.I, b.I as J
from A left join B on a.I=b.I and b.I=1e7;
quit;
proc sql _method;
create table T as
select A.I, b.I as J
from A left join B(where=(I=1e7)) on a.I=b.I ;
quit;
@ChrisNZ wrote:
Both SQL steps take 4.8s on my machine.
data A(sortedby=I) B(sortedby=I); do I=1 to 1e7; output; end; run; proc sql _method; create table T as select A.I, b.I as J from A left join B on a.I=b.I and b.I=1e7; quit; proc sql _method; create table T as select A.I, b.I as J from A left join B(where=(I=1e7)) on a.I=b.I ; quit;
Cached data sets?
I haven't tested the suggestion in a while but when I had some data across network drives sub-setting the data had some positive impact in my environment.
Interesting. Something to keep one's eyes on then.
It'd be disappointing if the SQL optimiser did not subset the table. That's such an obvious step.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.