Hi All,
I need you help to understand will too many left joins will impact performance. I am currently testing in sample data and my code is working fine, But I am getting this feeling that for larger volume the joins will take more time.
snippet of my code where I am using 7 joins
If you mean computing performance, something you might consider is that proc sql allows for multi-threaded processing. SAS defaults to 4 processors, but if your computer/system has more processors, you can increase that number.
/* evaluate CPU usage default for new session / since last CPU setting during session */
proc options option=cpucount;
run; /* default CPU count = 4 */
/* count and use all CPUs available for multi-threaded procs */
options threads cpucount=actual;
proc options option=cpucount;
run; /* count and use max CPU count available; my comp = 20 */
/* set CPUs count to use manually for multi-threaded procs */
options threads cpucount=19;
proc options option=cpucount;
run; /* max CPU count (20) - 1 = 19 (as not to overload other programs/processing) */
The SAS SQL optimizer got its limits and joins with a lot of tables can result in a sub-optimal execution path. SQL options _method and _tree write to the SAS log how SAS executes the joins.
Proc SQL needs to implicitly sort the tables along the join key. In your case you join all tables with the same column so a single sort per table will suffice. I would expect this join to work as efficiently as possible.
Multithreading only happens for the implicit sort operations and the installation defaults are normally appropriate. You should only have to change this in rare occasions.
If your base table is the big table and the lookup tables for left joins are the "small" ones then another approach is using SAS datastep hash tables. That normally beats any other approach in regards of performance because it avoids the need for any sorting.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.