hi,
Would anyone have a tip/suggestion on whether the following query could be optimized by nesting the queries together?
The below code gets the jobs done, but am looking to learn nested joins if that would be more efficient. The below essenstially runs IDs from table A against data in table data_table 4 times, each time seaching for the presence of a different status code. It is a left join as need to retain all IDs. If u have a suggestion, pls include explanation. Also, if there is any rule/best practices to follow for nesting joins, let me know,
thanks in advance
proc sql; create table b as select a.*, b.stat_cd|| b.stat_rsn_cd AS CL_STATUS, b.eff_from_dt AS CL_FRM_DT From a as a left join data_table as b on a.id=b.id and b.stat_cd='CL' ; quit; proc sql; create table C as select a.*, b.stat_cd|| b.stat_rsn_cd AS CO_STATUS, b.eff_from_dt AS CO_FRM_DT From B as a left join data_table as b on a.id=b.id and b.stat_cd='CO' ; quit; proc sql; create table D as select a.*, b.stat_cd|| b.stat_rsn_cd AS CR_STATUS, b.eff_from_dt AS CR_FRM_DT From C as a left join data_table as b on a.id=b.id and b.stat_cd='CR' ; quit; proc sql; create table END as select a.*, b.stat_cd|| b.stat_rsn_cd AS FR_STATUS, b.eff_from_dt AS FR_FRM_DT From D as a left join data_table as b on a.id=b.id and b.stat_cd='FR' ; quit;
Just use the same code but give each new "right" table aa distinct alias.
proc sql;
create table WANT as
select a.*
, b.stat_cd|| b.stat_rsn_cd AS CL_STATUS
, b.eff_from_dt AS CL_FRM_DT
, c.stat_cd|| c.stat_rsn_cd AS CO_STATUS
, c.eff_from_dt AS CO_FRM_DT
...
from a as a
left join data_table as b
on a.id=b.id and b.stat_cd='CL'
left join data_table as c
on a.id=c.id and c.stat_cd='CO'
...
;
quit;
Yes, without some data its hard to provide anything. What I would say however is that you appear to be doing exactly the same code four times, this does not seem optimal. Why do you need to split it up into four as all it looks like your doing is merging each set of variables back onto itself. This is where the art of transposing data would come in. With your data you should be able to transpose up the concatenated status variable and the result variable and then merge the two together, this would have the number of joins.
You could use sub-queries like this:
proc sql; create table WANT as select A.*, (select cats(STAT_CD,STAT_RSN_CD) from DATA_TABLE where A.ID=ID and STAT_CD='CL') as CL_STATUS, (select cats(STAT_CD,STAT_RSN_CD) from DATA_TABLE where A.ID=ID and STAT_CD="CO") as CO_STATUS, ... from A as A; quit;
It may read a bit simpler, but not sure its going to be more effective than two transposes and a merge. If you post test data in the form of a datastep we could look at some code tomorrow.
To: RW9
hi- thanks so much for the tip... i'll have to look into your suggestion of transposing up the concatenated status variable! I learn a lot from you, thank you
Just use the same code but give each new "right" table aa distinct alias.
proc sql;
create table WANT as
select a.*
, b.stat_cd|| b.stat_rsn_cd AS CL_STATUS
, b.eff_from_dt AS CL_FRM_DT
, c.stat_cd|| c.stat_rsn_cd AS CO_STATUS
, c.eff_from_dt AS CO_FRM_DT
...
from a as a
left join data_table as b
on a.id=b.id and b.stat_cd='CL'
left join data_table as c
on a.id=c.id and c.stat_cd='CO'
...
;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.