topic Re: Help needed to find alternatives for joins in SAS Programming

Help needed to find alternatives for joins

Aexor — Fri, 28 Jul 2023 14:29:00 GMT

Hi All,

I need you help to understand will too many left joins will impact performance. I am currently testing in sample data and my code is working fine, But I am getting this feeling that for larger volume the joins will take more time.

snippet of my code where I am using 7 joins

proc sql noprint;

create table work.mandate_unapproved_res_tmp1 as

select

t1.order_id as plan_id,

t1.sales_units_qty,

t1.revenue_amt,

t1.revenue_no_vat_amt,

t1.margin_amt,

t1.promotion_spend_amt,

t1.cost_amt,

t2.plan_nm as plan_nm,

t2.plan_desc as plan_desc,

t2.start_dt as start_dt,

t2.end_dt as end_dt,

t2.plan_status_no as plan_status_no,

t2.plan_approval_flg as plan_approval_status_no,

t3.price_type_no as price_type_no,

t3.price_value_no as price_value_no,

t3.price_value_rec_flg as price_value_rec_flg,

t3.code as vehicle_cd,

t4.num_prod_no as num_prod_no,

t5.num_geo_no as num_geo_no,

t6.num_prod_geo_no as num_prod_geo_no,

t7.rule_val as obj_func

from work.mandate_unapproved_res_t1 t1

left join &gv_pricing_libname..main_plan t2

on upcase(t1.order_id)=upcase(t2.order_id)

left join work.pp_vehicle t3

on upcase(t1.order_id)=upcase(t3.order_id)

left join work.unapproved_prod_sku t4

on upcase(t1.order_id)=upcase(t4.order_id)

left join work.unapproved_geo_sku t5

on upcase(t1.order_id)=upcase(t5.order_id)

left join work.temp_prd_geo_cnt t6

on upcase(t1.order_id)=upcase(t6.order_id)

left join work.mandate_plan_rule t7

on upcase(t1.order_id)=upcase(t7.order_id);

quit;

Any suggestion or help will be much appreciated.

Thanks!

Re: Help needed to find alternatives for joins

awesome_opossum — Fri, 28 Jul 2023 16:22:57 GMT

If you mean computing performance, something you might consider is that proc sql allows for multi-threaded processing. SAS defaults to 4 processors, but if your computer/system has more processors, you can increase that number.

/* evaluate CPU usage default for new session / since last CPU setting during session */ 
proc options option=cpucount;
run; /* default CPU count = 4 */ 

/* count and use all CPUs available for multi-threaded procs */ 
options threads cpucount=actual;
proc options option=cpucount;
run; /* count and use max CPU count available; my comp = 20 */ 

/* set CPUs count to use manually for multi-threaded procs */ 
options threads cpucount=19;
proc options option=cpucount;
run; /* max CPU count (20) - 1 = 19 (as not to overload other programs/processing) */

Re: Help needed to find alternatives for joins

Patrick — Sat, 29 Jul 2023 00:34:52 GMT

The SAS SQL optimizer got its limits and joins with a lot of tables can result in a sub-optimal execution path. SQL options _method and _tree write to the SAS log how SAS executes the joins.

Proc SQL needs to implicitly sort the tables along the join key. In your case you join all tables with the same column so a single sort per table will suffice. I would expect this join to work as efficiently as possible.

Multithreading only happens for the implicit sort operations and the installation defaults are normally appropriate. You should only have to change this in rare occasions.

If your base table is the big table and the lookup tables for left joins are the "small" ones then another approach is using SAS datastep hash tables. That normally beats any other approach in regards of performance because it avoids the need for any sorting.