Hello,
I have been given some code from SQL developer to put into SAS and the majority of it runs fine. I have one step which takes 10 minutes to run and i was hoping to improve its performance.
Does anything stand out from the code below which could be improved performance wise? The exists and not exists pieces seem to make the code take longer to run (it runs in seconds without them). These parts reference back to the original base datasets (which are only going to increase in size over time). I am not sure if there is enough information here but any advice would be appreciated!
PROC SQL;
CREATE TABLE DATASET1 AS
SELECT A.CUSTOMER,
A.CUSTOMER_REC,
A.APPNO,
A.TYPE,
A.DATE1,
A.STATUS,
A.STATUS1,
A.STATUS2,
A.DATE,
A.TDAY
FROM DATASET AS A
WHERE (A.STATUS = 'A'
OR (A.STATUS = 'D'
AND NOT EXISTS (SELECT 1
FROM BASE.BASE1 B,
BASE.BASE2 C
WHERE B.CUSTOMER_REC = A.CUSTOMER_REC
AND B.UNIQUEKEY = C.UNIQUEKEY
AND C.VAR = 2
AND C.VAR2 IN('41','42','43'))
AND EXISTS (SELECT 1
FROM BASE.BASE1 B,
BASE.BASE2 C
WHERE B.CUSTOMER_REC = A.CUSTOMER_REC
AND B.UNIQUEKEY = C.UNIQUEKEY
AND C.VAR = 2
AND C.VAR2 IN('41','44')) ) )
AND (A.STATUS IN ('A','D')
AND A.STATUS1 = 'C'
AND A.STATUS2 NOT IN ('A','B'));
QUIT;
Well, I would suggest creating a inter file first of the base1 and base2 datasets first, this can then be merged on directly to the main data, something like this (but note it is quite hard to ascertain what the best approach is without seeing any data - why do you have three datasets for example - what is the point of base1 and base2 - without seeing all the links its hard to re-structure):
proc sql; create table BASE as select CUSTOMER_REC, (select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','42','43')) as FLAG1, (select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','44')) as FLAG2 from BASE1; create table WANT as select A.CUSTOMER, A.CUSTOMER_REC, A.APPNO, A.TYPE, A.DATE1, A.STATUS, A.STATUS1, A.STATUS2, A.DATE, A.TDAY from DATASET A left join CUSTOMER_REC B on A.CUSTOMER_REC=B.CUSTOMER_REC where (A.STATUS="A" or (A.STATUS="D" and B.FLAG1=1 and B.FLAG2=1)) and (A.STATUS in ("A","D") and A.STATUS1="C" and A.STATUS2 not in ("A","B")); quit;
Well, I would suggest creating a inter file first of the base1 and base2 datasets first, this can then be merged on directly to the main data, something like this (but note it is quite hard to ascertain what the best approach is without seeing any data - why do you have three datasets for example - what is the point of base1 and base2 - without seeing all the links its hard to re-structure):
proc sql; create table BASE as select CUSTOMER_REC, (select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','42','43')) as FLAG1, (select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','44')) as FLAG2 from BASE1; create table WANT as select A.CUSTOMER, A.CUSTOMER_REC, A.APPNO, A.TYPE, A.DATE1, A.STATUS, A.STATUS1, A.STATUS2, A.DATE, A.TDAY from DATASET A left join CUSTOMER_REC B on A.CUSTOMER_REC=B.CUSTOMER_REC where (A.STATUS="A" or (A.STATUS="D" and B.FLAG1=1 and B.FLAG2=1)) and (A.STATUS in ("A","D") and A.STATUS1="C" and A.STATUS2 not in ("A","B")); quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.