Hello,
I have been given some code from SQL developer to put into SAS and the majority of it runs fine. I have one step which takes 10 minutes to run and i was hoping to improve its performance.
Does anything stand out from the code below which could be improved performance wise? The exists and not exists pieces seem to make the code take longer to run (it runs in seconds without them). These parts reference back to the original base datasets (which are only going to increase in size over time). I am not sure if there is enough information here but any advice would be appreciated!
PROC SQL;
CREATE TABLE DATASET1 AS
SELECT A.CUSTOMER,
A.CUSTOMER_REC,
A.APPNO,
A.TYPE,
A.DATE1,
A.STATUS,
A.STATUS1,
A.STATUS2,
A.DATE,
A.TDAY
FROM DATASET AS A
WHERE (A.STATUS = 'A'
OR (A.STATUS = 'D'
AND NOT EXISTS (SELECT 1
FROM BASE.BASE1 B,
BASE.BASE2 C
WHERE B.CUSTOMER_REC = A.CUSTOMER_REC
AND B.UNIQUEKEY = C.UNIQUEKEY
AND C.VAR = 2
AND C.VAR2 IN('41','42','43'))
AND EXISTS (SELECT 1
FROM BASE.BASE1 B,
BASE.BASE2 C
WHERE B.CUSTOMER_REC = A.CUSTOMER_REC
AND B.UNIQUEKEY = C.UNIQUEKEY
AND C.VAR = 2
AND C.VAR2 IN('41','44')) ) )
AND (A.STATUS IN ('A','D')
AND A.STATUS1 = 'C'
AND A.STATUS2 NOT IN ('A','B'));
QUIT;
Well, I would suggest creating a inter file first of the base1 and base2 datasets first, this can then be merged on directly to the main data, something like this (but note it is quite hard to ascertain what the best approach is without seeing any data - why do you have three datasets for example - what is the point of base1 and base2 - without seeing all the links its hard to re-structure):
proc sql;
create table BASE as
select CUSTOMER_REC,
(select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','42','43')) as FLAG1,
(select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','44')) as FLAG2
from BASE1;
create table WANT as
select A.CUSTOMER,
A.CUSTOMER_REC,
A.APPNO,
A.TYPE,
A.DATE1,
A.STATUS,
A.STATUS1,
A.STATUS2,
A.DATE,
A.TDAY
from DATASET A
left join CUSTOMER_REC B
on A.CUSTOMER_REC=B.CUSTOMER_REC
where (A.STATUS="A" or (A.STATUS="D" and B.FLAG1=1 and B.FLAG2=1))
and (A.STATUS in ("A","D") and A.STATUS1="C" and A.STATUS2 not in ("A","B"));
quit;
Well, I would suggest creating a inter file first of the base1 and base2 datasets first, this can then be merged on directly to the main data, something like this (but note it is quite hard to ascertain what the best approach is without seeing any data - why do you have three datasets for example - what is the point of base1 and base2 - without seeing all the links its hard to re-structure):
proc sql;
create table BASE as
select CUSTOMER_REC,
(select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','42','43')) as FLAG1,
(select 1 from BASE.BASE2 A where A.CUSTOMER_REC=CUSTOMER_REC and A.VAR2 in ('41','44')) as FLAG2
from BASE1;
create table WANT as
select A.CUSTOMER,
A.CUSTOMER_REC,
A.APPNO,
A.TYPE,
A.DATE1,
A.STATUS,
A.STATUS1,
A.STATUS2,
A.DATE,
A.TDAY
from DATASET A
left join CUSTOMER_REC B
on A.CUSTOMER_REC=B.CUSTOMER_REC
where (A.STATUS="A" or (A.STATUS="D" and B.FLAG1=1 and B.FLAG2=1))
and (A.STATUS in ("A","D") and A.STATUS1="C" and A.STATUS2 not in ("A","B"));
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.