Hello experts,
The code given is running too long and I couldn't find a way how to optimize it so could you please help me about it? I attached also sample dataset and ALLPRICE table has 161.403.360 records. There are 3 scenarios and and these scenarios could run parallel the first thing i came up but any other ideas would be great.
Thanks in advance,
Kindly regards
%let scenario_name=base adverse favorable;
%let scenario_short=BAS ADV FAV;
%PUT &scenario_name.;
%PUT &scenario_short.;
%include "/sasdata/sasinput/%left(&sst_current_project_id.)/ifrs9_00_init.sas";
%MACRO SCENARIO_FORMAT(SCENARIO,SHORT);
PROC SQL;
CREATE TABLE SENARYO_FILTRE AS
SELECT DISTINCT t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
T1.EL_BDDK
FROM SAS_RSLT.ALLPRICE t1
WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;
PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS
SELECT DISTINCT t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
t2.EL_1Y_TOTAL,
t2.IFRS_RATING,
T1.EL_BDDK,
t2.InstID AS key
FROM WORK.SENARYO_FILTRE t1
LEFT JOIN SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
ORDER BY t1.InstID,
t1.SimulationTime;
QUIT;
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
if instid = key and SimulationTime= 1 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 2 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 3 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 4 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 5 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 6 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 7 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 8 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 9 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 10 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
run;
PROC SQL;
CREATE TABLE SENARYO_FILTRE_4 AS
SELECT DISTINCT t1.InstID,
/* PDM1 */
(SUM(t1.PDM1)) AS PDM1,
/* PDM2 */
(SUM(t1.PDM2)) AS PDM2,
/* PDM3 */
(SUM(t1.PDM3)) AS PDM3,
/* PDM4 */
(SUM(t1.PDM4)) AS PDM4,
/* PDM5 */
(SUM(t1.PDM5)) AS PDM5,
/* PDM6 */
(SUM(t1.PDM6)) AS PDM6,
/* PDM7 */
(SUM(t1.PDM7)) AS PDM7,
/* PDM8 */
(SUM(t1.PDM8)) AS PDM8,
/* PDM9 */
(SUM(t1.PDM9)) AS PDM9,
/* PDM10 */
(SUM(t1.PDM10)) AS PDM10,
/* CORRECTEDPD1 */
(SUM(t1.CORRECTEDPD1)) AS CORRECTEDPD1,
/* CORRECTEDPD2 */
(SUM(t1.CORRECTEDPD2)) AS CORRECTEDPD2,
/* CORRECTEDPD3 */
(SUM(t1.CORRECTEDPD3)) AS CORRECTEDPD3,
/* CORRECTEDPD4 */
(SUM(t1.CORRECTEDPD4)) AS CORRECTEDPD4,
/* CORRECTEDPD5 */
(SUM(t1.CORRECTEDPD5)) AS CORRECTEDPD5,
/* CORRECTEDPD6 */
(SUM(t1.CORRECTEDPD6)) AS CORRECTEDPD6,
/* CORRECTEDPD7 */
(SUM(t1.CORRECTEDPD7)) AS CORRECTEDPD7,
/* CORRECTEDPD8 */
(SUM(t1.CORRECTEDPD8)) AS CORRECTEDPD8,
/* CORRECTEDPD9 */
(SUM(t1.CORRECTEDPD9)) AS CORRECTEDPD9,
/* CORRECTEDPD10 */
(SUM(t1.CORRECTEDPD10)) AS CORRECTEDPD10,
/* DISCOUNT1 */
(SUM(t1.DISCOUNT1)) AS DISCOUNT1,
/* DISCOUNT2 */
(SUM(t1.DISCOUNT2)) AS DISCOUNT2,
/* DISCOUNT3 */
(SUM(t1.DISCOUNT3)) AS DISCOUNT3,
/* DISCOUNT4 */
(SUM(t1.DISCOUNT4)) AS DISCOUNT4,
/* DISCOUNT5 */
(SUM(t1.DISCOUNT5)) AS DISCOUNT5,
/* DISCOUNT6 */
(SUM(t1.DISCOUNT6)) AS DISCOUNT6,
/* DISCOUNT7 */
(SUM(t1.DISCOUNT7)) AS DISCOUNT7,
/* DISCOUNT8 */
(SUM(t1.DISCOUNT8)) AS DISCOUNT8,
/* DISCOUNT9 */
(SUM(t1.DISCOUNT9)) AS DISCOUNT9,
/* DISCOUNT10 */
(SUM(t1.DISCOUNT10)) AS DISCOUNT10,
/* PDRATE1 */
(SUM(t1.PDRATE1)) AS PDRATE1,
/* PDRATE2 */
(SUM(t1.PDRATE2)) AS PDRATE2,
/* PDRATE3 */
(SUM(t1.PDRATE3)) AS PDRATE3,
/* PDRATE4 */
(SUM(t1.PDRATE4)) AS PDRATE4,
/* PDRATE5 */
(SUM(t1.PDRATE5)) AS PDRATE5,
/* PDRATE6 */
(SUM(t1.PDRATE6)) AS PDRATE6,
/* PDRATE7 */
(SUM(t1.PDRATE7)) AS PDRATE7,
/* PDRATE8 */
(SUM(t1.PDRATE8)) AS PDRATE8,
/* PDRATE9 */
(SUM(t1.PDRATE9)) AS PDRATE9,
/* PDRATE10 */
(SUM(t1.PDRATE10)) AS PDRATE10,
/* EL1 */
(SUM(t1.EL1)) AS EL1,
/* EL2 */
(SUM(t1.EL2)) AS EL2,
/* EL3 */
(SUM(t1.EL3)) AS EL3,
/* EL4 */
(SUM(t1.EL4)) AS EL4,
/* EL5 */
(SUM(t1.EL5)) AS EL5,
/* EL6 */
(SUM(t1.EL6)) AS EL6,
/* EL7 */
(SUM(t1.EL7)) AS EL7,
/* EL8 */
(SUM(t1.EL8)) AS EL8,
/* EL9 */
(SUM(t1.EL9)) AS EL9,
/* EL10 */
(SUM(t1.EL10)) AS EL10,
/* CUM_R_EL1 */
(SUM(t1.CUM_R_EL1)) AS CUM_R_EL1,
/* CUM_R_EL2 */
(SUM(t1.CUM_R_EL2)) AS CUM_R_EL2,
/* CUM_R_EL3 */
(SUM(t1.CUM_R_EL3)) AS CUM_R_EL3,
/* CUM_R_EL4 */
(SUM(t1.CUM_R_EL4)) AS CUM_R_EL4,
/* CUM_R_EL5 */
(SUM(t1.CUM_R_EL5)) AS CUM_R_EL5,
/* CUM_R_EL6 */
(SUM(t1.CUM_R_EL6)) AS CUM_R_EL6,
/* CUM_R_EL7 */
(SUM(t1.CUM_R_EL7)) AS CUM_R_EL7,
/* CUM_R_EL8 */
(SUM(t1.CUM_R_EL8)) AS CUM_R_EL8,
/* CUM_R_EL9 */
(SUM(t1.CUM_R_EL9)) AS CUM_R_EL9,
/* CUM_R_EL10 */
(SUM(t1.CUM_R_EL10)) AS CUM_R_EL10,
/* CUM_EL1 */
(SUM(t1.CUM_EL1)) AS CUM_EL1,
/* CUM_EL2 */
(SUM(t1.CUM_EL2)) AS CUM_EL2,
/* CUM_EL3 */
(SUM(t1.CUM_EL3)) AS CUM_EL3,
/* CUM_EL4 */
(SUM(t1.CUM_EL4)) AS CUM_EL4,
/* CUM_EL5 */
(SUM(t1.CUM_EL5)) AS CUM_EL5,
/* CUM_EL6 */
(SUM(t1.CUM_EL6)) AS CUM_EL6,
/* CUM_EL7 */
(SUM(t1.CUM_EL7)) AS CUM_EL7,
/* CUM_EL8 */
(SUM(t1.CUM_EL8)) AS CUM_EL8,
/* CUM_EL9 */
(SUM(t1.CUM_EL9)) AS CUM_EL9,
/* CUM_EL10 */
(SUM(t1.CUM_EL10)) AS CUM_EL10,
EL_BDDK
/* EL_LIFETIME_PARTIAL*/
FROM SENARYO_FILTRE_3 t1
GROUP BY t1.InstID;
QUIT;
PROC SQL;
CREATE TABLE SAS_PORT.SENARYOFLTRFEB_&SCENARIO. AS
SELECT t2.INSTID,
"&SHORT" AS PD_F,
t2.INSTTYPE,
t2.DATA_DATE,
t2.MASTID,
t2.MATURITY_DATE,
t2.BALANCE_TL,
t2.BALANCE_ORG,
t2.CURRENCY_CODE,
t2.RECONSTRUCTION_FLAG,
t2.SHORT_PORTFOLIO,
t2.PORTFOLIO_ADJUSTED,
t2.BNPP_PORTFOLIO,
t2.RETAIL_TYPE,
t2.RETAIL_FLAG,
t2.COMMITMENT_TYPE,
t2.CREDIT_TYPE,
t2.ACCOUNT_NUMBER,
t2.BANK_GROUP_CODE,
t2.BANK_GROUP_NAME,
t2.BANK_NAME,
t2.BANKNO,
t2.BLOCKING_DEGREE,
t2.BRANCH_ID,
t2.COUNTERPARTY_NAME,
t2.COUNTERPARTY_TYPE,
t2.CREDIT_TYPE_NAME,
t2.ACCRUED_INTEREST_AMT_TL,
t2.AVG_BNP_RATING,
t2.BUSINESS_LINE_CODE,
t2.CURRENT_IG_NG,
t2.COUNTERPARTY_ID,
t2.COUNTERPARTY_VKN,
t2.DEK_ADJ,
t2.DEK_BAKIYE,
t2.DEK_CURRENCY_CODE,
t2.CRITEMNO,
t2.CURR_INTEREST_RATE,
t2.FAAL_ORAN,
t2.CURRENT_BNP_RATING,
t2.CURRENT_BUCKET,
t2.CURRENT_EXCHANGE_RATE,
t2.CURRENT_RATING,
t2.CURRENT_RATING_POINT,
t2.CURRENT_SCORE,
t2.DELAY_DATE,
t2.EFFECTIVE_INTEREST_RATE,
t2.GENERAL_LOAN_LIMIT,
t2.GENERAL_LOAN_MATURITY,
t2.GL_ACCOUNT_NUM,
t2.GROUP_CODE,
t2.GROUP_GENERAL_LOAN_LIMIT,
t2.GROUP_DESC,
t2.IFRS_RATING,
t2.INCEPTION_DATE,
t2.INCEPTION_EXCHANGE_RATE,
t2.INITIAL_BUCKET,
t2.INITIAL_IG_NG,
t2.INITIAL_RATING,
t2.INITIAL_RATING_DATE,
t2.INITIAL_RATING_POINT,
t2.INTEREST_CALC_TYP_ORJ,
t2.KTRNO,
t2.INITIAL_SCORE,
t2.LOAN_CUSTOMER_STATUS,
t2.LOAN_TP_ID,
t2.LGD_RATE,
t2.MAIN_BRANCH_ID,
t2.MIS_PRODUCTMAPNO,
t2.NACE_CODE,
t2.NACE_NAME,
t2.OPEN_EXCH_RATE,
t2.PROV_DELAY,
t2.RATING_ANALYSIS_NO,
t2.RATING_CLASS,
t2.RATING_DNM_GRP,
t2.RATING_DATE,
t2.RATING_GRP,
t2.REDISCOUNT_TL,
t2.RMPM_ID,
t2.RR_DISCOUNTED,
t2.SCORE_CARD_ID,
t2.STANDART_CCF,
t2.TREEFAMILYNO,
t2.TREELINENO,
t2.TREESINGLENO,
t3.EAD_STAGE1_TOTAL,
t3.EL_1Y_BALANCE,
t3.EL_1Y_REDISC,
t3.EL_1Y_TOTAL,
t3.EL_LIFETIME_REDISC,
t3.EL_LIFETIME_WITH_REDISC,
t3.EL_LIFETIME_WITHOUT_REDISC,
t3.REMAINING_YEAR,
t2.IFRS_RATING,
t2.AVG_RATING,
t2.SCORE_PROV_DELAY,
t3.BASE_1Y_BALANCE,
t3.FAV_1Y_BALANCE,
t3.ADV_1Y_BALANCE,
t3.BASE_1Y_REDISC,
t3.FAV_1Y_REDISC ,
t3.ADV_1Y_REDISC,
t3.BASE_1Y_TOTAL,
t3.FAV_1Y_TOTAL,
t3.ADV_1Y_TOTAL,
t2.POINT_DIFF,
t2.FORBORNE_FLAG,
t2.RR_ASSESSMENT_TYPE,
t2.STAGING,
t2.STAGING_DETAIL,
t2.STAGING_REASON,
t1.CORRECTEDPD1,
t1.CORRECTEDPD2,
t1.CORRECTEDPD3,
t1.CORRECTEDPD4,
t1.CORRECTEDPD5,
t1.CORRECTEDPD6,
t1.CORRECTEDPD7,
t1.CORRECTEDPD8,
t1.CORRECTEDPD9,
t1.CORRECTEDPD10,
t1.DISCOUNT1,
t1.DISCOUNT2,
t1.DISCOUNT3,
t1.DISCOUNT4,
t1.DISCOUNT5,
t1.DISCOUNT6,
t1.DISCOUNT7,
t1.DISCOUNT8,
t1.DISCOUNT9,
t1.DISCOUNT10,
t1.PDRATE1,
t1.PDRATE2,
t1.PDRATE3,
t1.PDRATE4,
t1.PDRATE5,
t1.PDRATE6,
t1.PDRATE7,
t1.PDRATE8,
t1.PDRATE9,
t1.PDRATE10,
t1.PDM1,
t1.PDM2,
t1.PDM3,
t1.PDM4,
t1.PDM5,
t1.PDM6,
t1.PDM7,
t1.PDM8,
t1.PDM9,
t1.PDM10,
t1.EL1,
t1.EL2,
t1.EL3,
t1.EL4,
t1.EL5,
t1.EL6,
t1.EL7,
t1.EL8,
t1.EL9,
t1.EL10,
t1.CUM_R_EL1,
t1.CUM_R_EL2,
t1.CUM_R_EL3,
t1.CUM_R_EL4,
t1.CUM_R_EL5,
t1.CUM_R_EL6,
t1.CUM_R_EL7,
t1.CUM_R_EL8,
t1.CUM_R_EL9,
t1.CUM_R_EL10,
t1.CUM_EL1,
t1.CUM_EL2,
t1.CUM_EL3,
t1.CUM_EL4,
t1.CUM_EL5,
t1.CUM_EL6,
t1.CUM_EL7,
t1.CUM_EL8,
t1.CUM_EL9,
t1.CUM_EL10,
T1.EL_BDDK
/*,*/
/* t1.EL_LIFETIME_PARTIAL*/
/* t1. IFRS9_FINAL_STAGE ,*/
/* t1.IFRS9_FINAL_EL*/
FROM SENARYO_FILTRE_4 t1
LEFT JOIN sas_port.Portfolio_%left(&mcr_part_id._v2) t2 ON (t1.InstID = t2.INSTID)
/* LEFT JOIN SAS_PORT.PORTFOLIO_201905_V2 t2 ON (t1.InstID = t2.INSTID)*/
LEFT JOIN SAS_RSLT.ECL_ALLOCATION_DETAIL t3 ON (t1.InstID = t3.InstID);
QUIT;
%MEND SCENARIO_FORMAT;
%MACRO LOOP_SCENARIO_FORMAT;
%LET I=1;
%DO %WHILE(%SCAN(&scenario_name.,&I.,%STR( ))~=);
%SCENARIO_FORMAT(%SCAN(&scenario_name.,&I.,%STR( )),%SCAN(&scenario_short.,&I.,%STR( )));
%LET I=&I.+1;
%END;
%MEND LOOP_SCENARIO_FORMAT; ;
%LOOP_SCENARIO_FORMAT;
Start by making it readable.
I started with the first two SQL steps, they should be formatted decently, something like
PROC SQL;
CREATE TABLE SENARYO_FILTRE AS SELECT DISTINCT
t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
T1.EL_BDDK
FROM
SAS_RSLT.ALLPRICE t1
WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;
PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT
t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
t2.EL_1Y_TOTAL,
t2.IFRS_RATING,
T1.EL_BDDK,
t2.InstID AS key
FROM
WORK.SENARYO_FILTRE t1 LEFT JOIN
SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
ORDER BY
t1.InstID,
t1.SimulationTime
;
QUIT;
Now that these two steps can actually be read, it is obvious that they could be made into a single step:
PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT
AllPrice.InstID,
AllPrice.SimulationTime,
AllPrice._CORRECTED_PD_,
AllPrice._EIR_ADJUSTED_,
AllPrice._PD_RATE_,
AllPrice._PDM_,
AllPrice.EL_LIFETIME_PARTIAL,
AllPrice.EL_LIFETIME_WITH_REDISC,
AllPrice.EL_LIFETIME_WITHOUT_REDISC,
Details.EL_1Y_TOTAL,
Details.IFRS_RATING,
AllPrice.EL_BDDK,
Details.InstID AS key
FROM
SAS_RSLT.ALLPRICE AllPrice LEFT JOIN
SAS_RSLT.ECL_ALLOCATION_DETAIL Details ON (AllPrice.InstID = Details.InstID)
where
AllPrice.AnalysisName = "&SCENARIO."
ORDER BY
AllPrice.InstID,
AllPrice.SimulationTime
;
QUIT;
Note that I replaced the original meaningless aliases (t1 and t2) with more meaningful names. In this case, it is almost unnecessary, as the code is relatively short, but if it gets a little bit more complicated, you cannot see the whole query on one page, and you have to scroll up and down to get the meaning of the different aliases (I have seen queries of several hundred lines, where aliases like a,b,c...,j were used, and then table "i" was read more than once, so the second version got the alias "ii" - completely unreadable).
The data step should also be formatted, but while formatting it I noticed that the same code was repeated over and over again. So what the actual data step boils down to is this:
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
if instid = key and SimulationTime=in(1,2,3,4,5,6,7,8,9,10) then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
run;
Which leaves the interesting question: what to do when SimulationTime is not one of the values 1 to 10? It seems that the record is output, but without any calculated values. As all the variables are summed later, it "does not matter" - except for performance, which appears to be what you are looking to improve. So you may put in an "ELSE DELETE" statement here, or just change the IF statement to a subsetting IF - but wait, you could also change that to a WHERE clause, even faster:
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
where instid = key and SimulationTime in(1,2,3,4,5,6,7,8,9,10);
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
run;
Actually, it would have been even smarter to put that as a condition in the previous step. Or rather, don't use LEFT JOIN in the first step, as you carefully get rid of the data where there is no INSTID for the KEY.
The next step is a mess: very meticulous, but not formatted at all. The comments are seriously nonsensical and makes the code even harder to read.
It should be rewritten as a PROC SUMMARY, much shorter, and you can now skim the whole code at a single glance:
proc summary data=SENARYO_FILTRE_3 nway;
class instid EL_BDDK;
var PDM1-PMD10
CORRECTEDPD1-CORRECTEDPD10
DISCOUNT1-DISCOUNT10
PDRATE1-PDRATE10
EL1-EL10
CUM_R_EL1-CUM_R_EL10
CUM_EL1-CUM_EL10
;
output out=SAS_PORT.SENARYOFLTRFEB_&SCENARIO(drop=_:) sum=;
run;
When looking at your original code, a question springs to mind: what is the EL_BDDK variable doing? It is not summed, and it is not used for grouping! I assume that your original code went well because EL_BBDK is uniquely dependent on the key, meaning that it could have been included in the GROUP variables. So I put it in as a CLASS variable in PROC SUMMARY.
The third step then again merges the output from the last step with a couple of other datasets, and creates a distinct table for each scenario.
And then you loop through the whole shebang again and again, once for each scenario. Why? I think it would be a lot faster to go through all the scenarios at once, using an IN clause in the first SQL step, and then use the scenario as a grouping variable at the end.
When it comes to performance, there is one question: How large are the other tables that you merge with in the end? if not too large, you should sort them and use a data step with a MERGE statement at the end. If necessary, it would, macro-wise, be very easy to split the output to several tables in a data step. This is probably where you would get a real performance saving.
What is the performance now? What do you need?
What is the purpose of the code?
Why are you using SQL instead of a SAS procedure?
Hi again,
Actually it is a step of SAS IFRS9 and we're trying to reduce run time. It is all changeable so we can use the SAS procedure instead of SQL. Do you have any suggestions?
Next code is not tested. I'm not sure if it will run faster but obviously the code is shorter and indented to be easily read. You may try it and should compare results to the results of original run.
I suggest to compare run time of each of the steps and decide weather to use the sas step/proc or the original sql.
%let scenario_name=base adverse favorable;
%let scenario_short=BAS ADV FAV;
%PUT &scenario_name.;
%PUT &scenario_short.;
%include "/sasdata/sasinput/%left(&sst_current_project_id.)/ifrs9_00_init.sas";
%MACRO SCENARIO_FORMAT(SCENARIO,SHORT);
proc sort data=SAS_RSLT.ALLPRICE(
keep= InstID SimulationTime _CORRECTED_PD_ _EIR_ADJUSTED_
_PD_RATE_ _PDM_ EL_LIFETIME_PARTIAL EL_LIFETIME_WITH_REDISC
EL_LIFETIME_WITHOUT_REDISC EL_BDDK
where=(AnalysisName = "&SCENARIO."))
out=SENARYO_FILTRE NODUPKEY;
by InstID SimulationTime;
run;
proc sort data= SAS_RSLT.ECL_ALLOCATION_DETAIL;
by InstID SimulationTime;
run;
data SENARYO_FILTRE_2;
merge SENARYO_FILTRE (in=in1 keep=_all_)
SAS_RSLT.ECL_ALLOCATION_DETAIL(in=in2 keep=EL_1Y_TOTAL IFRS_RATING );
by InstId SimulationTime;
if in2 then key = instid;
run;
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
if instid = key and (1 le SimulationTime le 10) then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
run;
proc summary data= SENARYO_FILTRE_3 nway noprint;
by InstId;
ID EL_BDDK;
var PDM1-PDM10 CORRECTEDPD1-CORRECTEDPD10 DISCOUNT1-DISCOUNT10
PDRATE1-PDRATE10 EL1-EL10 CUM_R_EL1-CUM_R_EL10 CUM_EL1-CUM_EL10
;
output out=SENARYO_FILTRE_4 sum=;
end;
data
merge SENARYO_FILTRE_4(in=in1)
sas_port.Portfolio_%left(&mcr_part_id._v2)
SAS_RSLT.ECL_ALLOCATION_DETAIL t3
;
by InstID;
if in1;
run;
%MEND SCENARIO_FORMAT;
%MACRO LOOP_SCENARIO_FORMAT;
%LET I=1;
%DO %WHILE(%SCAN(&scenario_name.,&I.,%STR( ))~=);
%SCENARIO_FORMAT(%SCAN(&scenario_name.,&I.,%STR( )),%SCAN(&scenario_short.,&I.,%STR( )));
%LET I=&I.+1;
%END;
%MEND LOOP_SCENARIO_FORMAT; ;
%LOOP_SCENARIO_FORMAT;
@FerhatD - you should post a SAS log of your program including the run times for each step. That will tell us which program steps are slow and need tuning.
Start by making it readable.
I started with the first two SQL steps, they should be formatted decently, something like
PROC SQL;
CREATE TABLE SENARYO_FILTRE AS SELECT DISTINCT
t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
T1.EL_BDDK
FROM
SAS_RSLT.ALLPRICE t1
WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;
PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT
t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
t2.EL_1Y_TOTAL,
t2.IFRS_RATING,
T1.EL_BDDK,
t2.InstID AS key
FROM
WORK.SENARYO_FILTRE t1 LEFT JOIN
SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
ORDER BY
t1.InstID,
t1.SimulationTime
;
QUIT;
Now that these two steps can actually be read, it is obvious that they could be made into a single step:
PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT
AllPrice.InstID,
AllPrice.SimulationTime,
AllPrice._CORRECTED_PD_,
AllPrice._EIR_ADJUSTED_,
AllPrice._PD_RATE_,
AllPrice._PDM_,
AllPrice.EL_LIFETIME_PARTIAL,
AllPrice.EL_LIFETIME_WITH_REDISC,
AllPrice.EL_LIFETIME_WITHOUT_REDISC,
Details.EL_1Y_TOTAL,
Details.IFRS_RATING,
AllPrice.EL_BDDK,
Details.InstID AS key
FROM
SAS_RSLT.ALLPRICE AllPrice LEFT JOIN
SAS_RSLT.ECL_ALLOCATION_DETAIL Details ON (AllPrice.InstID = Details.InstID)
where
AllPrice.AnalysisName = "&SCENARIO."
ORDER BY
AllPrice.InstID,
AllPrice.SimulationTime
;
QUIT;
Note that I replaced the original meaningless aliases (t1 and t2) with more meaningful names. In this case, it is almost unnecessary, as the code is relatively short, but if it gets a little bit more complicated, you cannot see the whole query on one page, and you have to scroll up and down to get the meaning of the different aliases (I have seen queries of several hundred lines, where aliases like a,b,c...,j were used, and then table "i" was read more than once, so the second version got the alias "ii" - completely unreadable).
The data step should also be formatted, but while formatting it I noticed that the same code was repeated over and over again. So what the actual data step boils down to is this:
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
if instid = key and SimulationTime=in(1,2,3,4,5,6,7,8,9,10) then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
run;
Which leaves the interesting question: what to do when SimulationTime is not one of the values 1 to 10? It seems that the record is output, but without any calculated values. As all the variables are summed later, it "does not matter" - except for performance, which appears to be what you are looking to improve. So you may put in an "ELSE DELETE" statement here, or just change the IF statement to a subsetting IF - but wait, you could also change that to a WHERE clause, even faster:
data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
where instid = key and SimulationTime in(1,2,3,4,5,6,7,8,9,10);
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
run;
Actually, it would have been even smarter to put that as a condition in the previous step. Or rather, don't use LEFT JOIN in the first step, as you carefully get rid of the data where there is no INSTID for the KEY.
The next step is a mess: very meticulous, but not formatted at all. The comments are seriously nonsensical and makes the code even harder to read.
It should be rewritten as a PROC SUMMARY, much shorter, and you can now skim the whole code at a single glance:
proc summary data=SENARYO_FILTRE_3 nway;
class instid EL_BDDK;
var PDM1-PMD10
CORRECTEDPD1-CORRECTEDPD10
DISCOUNT1-DISCOUNT10
PDRATE1-PDRATE10
EL1-EL10
CUM_R_EL1-CUM_R_EL10
CUM_EL1-CUM_EL10
;
output out=SAS_PORT.SENARYOFLTRFEB_&SCENARIO(drop=_:) sum=;
run;
When looking at your original code, a question springs to mind: what is the EL_BDDK variable doing? It is not summed, and it is not used for grouping! I assume that your original code went well because EL_BBDK is uniquely dependent on the key, meaning that it could have been included in the GROUP variables. So I put it in as a CLASS variable in PROC SUMMARY.
The third step then again merges the output from the last step with a couple of other datasets, and creates a distinct table for each scenario.
And then you loop through the whole shebang again and again, once for each scenario. Why? I think it would be a lot faster to go through all the scenarios at once, using an IN clause in the first SQL step, and then use the scenario as a grouping variable at the end.
When it comes to performance, there is one question: How large are the other tables that you merge with in the end? if not too large, you should sort them and use a data step with a MERGE statement at the end. If necessary, it would, macro-wise, be very easy to split the output to several tables in a data step. This is probably where you would get a real performance saving.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.