BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
FerhatD
Calcite | Level 5

Hello experts,

 

The code given is running too long and I couldn't find a way how to optimize it so could you please help me about it? I attached also sample dataset and ALLPRICE table has 161.403.360 records. There are 3 scenarios and and these scenarios could run parallel the first thing i came up but any other ideas would be great.

 

Thanks in advance,

Kindly regards

 

%let scenario_name=base adverse favorable;
%let scenario_short=BAS ADV FAV;
%PUT &scenario_name.;
%PUT &scenario_short.;

%include "/sasdata/sasinput/%left(&sst_current_project_id.)/ifrs9_00_init.sas";

%MACRO SCENARIO_FORMAT(SCENARIO,SHORT);

PROC SQL;
CREATE TABLE SENARYO_FILTRE AS
SELECT DISTINCT t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
T1.EL_BDDK
FROM SAS_RSLT.ALLPRICE t1
WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;

PROC SQL;
CREATE TABLE SENARYO_FILTRE_2 AS
SELECT DISTINCT t1.InstID,
t1.SimulationTime,
t1._CORRECTED_PD_,
t1._EIR_ADJUSTED_,
t1._PD_RATE_,
t1._PDM_,
t1.EL_LIFETIME_PARTIAL,
t1.EL_LIFETIME_WITH_REDISC,
t1.EL_LIFETIME_WITHOUT_REDISC,
t2.EL_1Y_TOTAL,
t2.IFRS_RATING,
T1.EL_BDDK,
t2.InstID AS key
FROM WORK.SENARYO_FILTRE t1
LEFT JOIN SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
ORDER BY t1.InstID,
t1.SimulationTime;
QUIT;

data SENARYO_FILTRE_3;
set SENARYO_FILTRE_2;
array PDM[10];
array CORRECTEDPD[10];
array DISCOUNT[10];
array PDRATE[10];
array EL[10];
array CUM_R_EL[10];
array CUM_EL[10];
if instid = key and SimulationTime= 1 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 2 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 3 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 4 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 5 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 6 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 7 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 8 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 9 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
else if instid = key and SimulationTime= 10 then do;
CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
PDRATE[SimulationTime]= _PD_RATE_;
CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
EL[SimulationTime]= EL_LIFETIME_PARTIAL;
DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
PDM[SimulationTime]= _PDM_;
CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
end;
run;

PROC SQL;
CREATE TABLE SENARYO_FILTRE_4 AS
SELECT DISTINCT t1.InstID,
/* PDM1 */
(SUM(t1.PDM1)) AS PDM1,
/* PDM2 */
(SUM(t1.PDM2)) AS PDM2,
/* PDM3 */
(SUM(t1.PDM3)) AS PDM3,
/* PDM4 */
(SUM(t1.PDM4)) AS PDM4,
/* PDM5 */
(SUM(t1.PDM5)) AS PDM5,
/* PDM6 */
(SUM(t1.PDM6)) AS PDM6,
/* PDM7 */
(SUM(t1.PDM7)) AS PDM7,
/* PDM8 */
(SUM(t1.PDM8)) AS PDM8,
/* PDM9 */
(SUM(t1.PDM9)) AS PDM9,
/* PDM10 */
(SUM(t1.PDM10)) AS PDM10,
/* CORRECTEDPD1 */
(SUM(t1.CORRECTEDPD1)) AS CORRECTEDPD1,
/* CORRECTEDPD2 */
(SUM(t1.CORRECTEDPD2)) AS CORRECTEDPD2,
/* CORRECTEDPD3 */
(SUM(t1.CORRECTEDPD3)) AS CORRECTEDPD3,
/* CORRECTEDPD4 */
(SUM(t1.CORRECTEDPD4)) AS CORRECTEDPD4,
/* CORRECTEDPD5 */
(SUM(t1.CORRECTEDPD5)) AS CORRECTEDPD5,
/* CORRECTEDPD6 */
(SUM(t1.CORRECTEDPD6)) AS CORRECTEDPD6,
/* CORRECTEDPD7 */
(SUM(t1.CORRECTEDPD7)) AS CORRECTEDPD7,
/* CORRECTEDPD8 */
(SUM(t1.CORRECTEDPD8)) AS CORRECTEDPD8,
/* CORRECTEDPD9 */
(SUM(t1.CORRECTEDPD9)) AS CORRECTEDPD9,
/* CORRECTEDPD10 */
(SUM(t1.CORRECTEDPD10)) AS CORRECTEDPD10,
/* DISCOUNT1 */
(SUM(t1.DISCOUNT1)) AS DISCOUNT1,
/* DISCOUNT2 */
(SUM(t1.DISCOUNT2)) AS DISCOUNT2,
/* DISCOUNT3 */
(SUM(t1.DISCOUNT3)) AS DISCOUNT3,
/* DISCOUNT4 */
(SUM(t1.DISCOUNT4)) AS DISCOUNT4,
/* DISCOUNT5 */
(SUM(t1.DISCOUNT5)) AS DISCOUNT5,
/* DISCOUNT6 */
(SUM(t1.DISCOUNT6)) AS DISCOUNT6,
/* DISCOUNT7 */
(SUM(t1.DISCOUNT7)) AS DISCOUNT7,
/* DISCOUNT8 */
(SUM(t1.DISCOUNT8)) AS DISCOUNT8,
/* DISCOUNT9 */
(SUM(t1.DISCOUNT9)) AS DISCOUNT9,
/* DISCOUNT10 */
(SUM(t1.DISCOUNT10)) AS DISCOUNT10,
/* PDRATE1 */
(SUM(t1.PDRATE1)) AS PDRATE1,
/* PDRATE2 */
(SUM(t1.PDRATE2)) AS PDRATE2,
/* PDRATE3 */
(SUM(t1.PDRATE3)) AS PDRATE3,
/* PDRATE4 */
(SUM(t1.PDRATE4)) AS PDRATE4,
/* PDRATE5 */
(SUM(t1.PDRATE5)) AS PDRATE5,
/* PDRATE6 */
(SUM(t1.PDRATE6)) AS PDRATE6,
/* PDRATE7 */
(SUM(t1.PDRATE7)) AS PDRATE7,
/* PDRATE8 */
(SUM(t1.PDRATE8)) AS PDRATE8,
/* PDRATE9 */
(SUM(t1.PDRATE9)) AS PDRATE9,
/* PDRATE10 */
(SUM(t1.PDRATE10)) AS PDRATE10,
/* EL1 */
(SUM(t1.EL1)) AS EL1,
/* EL2 */
(SUM(t1.EL2)) AS EL2,
/* EL3 */
(SUM(t1.EL3)) AS EL3,
/* EL4 */
(SUM(t1.EL4)) AS EL4,
/* EL5 */
(SUM(t1.EL5)) AS EL5,
/* EL6 */
(SUM(t1.EL6)) AS EL6,
/* EL7 */
(SUM(t1.EL7)) AS EL7,
/* EL8 */
(SUM(t1.EL8)) AS EL8,
/* EL9 */
(SUM(t1.EL9)) AS EL9,
/* EL10 */
(SUM(t1.EL10)) AS EL10,
/* CUM_R_EL1 */
(SUM(t1.CUM_R_EL1)) AS CUM_R_EL1,
/* CUM_R_EL2 */
(SUM(t1.CUM_R_EL2)) AS CUM_R_EL2,
/* CUM_R_EL3 */
(SUM(t1.CUM_R_EL3)) AS CUM_R_EL3,
/* CUM_R_EL4 */
(SUM(t1.CUM_R_EL4)) AS CUM_R_EL4,
/* CUM_R_EL5 */
(SUM(t1.CUM_R_EL5)) AS CUM_R_EL5,
/* CUM_R_EL6 */
(SUM(t1.CUM_R_EL6)) AS CUM_R_EL6,
/* CUM_R_EL7 */
(SUM(t1.CUM_R_EL7)) AS CUM_R_EL7,
/* CUM_R_EL8 */
(SUM(t1.CUM_R_EL8)) AS CUM_R_EL8,
/* CUM_R_EL9 */
(SUM(t1.CUM_R_EL9)) AS CUM_R_EL9,
/* CUM_R_EL10 */
(SUM(t1.CUM_R_EL10)) AS CUM_R_EL10,
/* CUM_EL1 */
(SUM(t1.CUM_EL1)) AS CUM_EL1,
/* CUM_EL2 */
(SUM(t1.CUM_EL2)) AS CUM_EL2,
/* CUM_EL3 */
(SUM(t1.CUM_EL3)) AS CUM_EL3,
/* CUM_EL4 */
(SUM(t1.CUM_EL4)) AS CUM_EL4,
/* CUM_EL5 */
(SUM(t1.CUM_EL5)) AS CUM_EL5,
/* CUM_EL6 */
(SUM(t1.CUM_EL6)) AS CUM_EL6,
/* CUM_EL7 */
(SUM(t1.CUM_EL7)) AS CUM_EL7,
/* CUM_EL8 */
(SUM(t1.CUM_EL8)) AS CUM_EL8,
/* CUM_EL9 */
(SUM(t1.CUM_EL9)) AS CUM_EL9,
/* CUM_EL10 */
(SUM(t1.CUM_EL10)) AS CUM_EL10,
EL_BDDK
/* EL_LIFETIME_PARTIAL*/
FROM SENARYO_FILTRE_3 t1
GROUP BY t1.InstID;
QUIT;

PROC SQL;
CREATE TABLE SAS_PORT.SENARYOFLTRFEB_&SCENARIO. AS
SELECT t2.INSTID,
"&SHORT" AS PD_F,
t2.INSTTYPE,
t2.DATA_DATE,
t2.MASTID,
t2.MATURITY_DATE,
t2.BALANCE_TL,
t2.BALANCE_ORG,
t2.CURRENCY_CODE,
t2.RECONSTRUCTION_FLAG,
t2.SHORT_PORTFOLIO,
t2.PORTFOLIO_ADJUSTED,
t2.BNPP_PORTFOLIO,
t2.RETAIL_TYPE,
t2.RETAIL_FLAG,
t2.COMMITMENT_TYPE,
t2.CREDIT_TYPE,
t2.ACCOUNT_NUMBER,
t2.BANK_GROUP_CODE,
t2.BANK_GROUP_NAME,
t2.BANK_NAME,
t2.BANKNO,
t2.BLOCKING_DEGREE,
t2.BRANCH_ID,
t2.COUNTERPARTY_NAME,
t2.COUNTERPARTY_TYPE,
t2.CREDIT_TYPE_NAME,
t2.ACCRUED_INTEREST_AMT_TL,
t2.AVG_BNP_RATING,
t2.BUSINESS_LINE_CODE,
t2.CURRENT_IG_NG,
t2.COUNTERPARTY_ID,
t2.COUNTERPARTY_VKN,
t2.DEK_ADJ,
t2.DEK_BAKIYE,
t2.DEK_CURRENCY_CODE,
t2.CRITEMNO,
t2.CURR_INTEREST_RATE,
t2.FAAL_ORAN,
t2.CURRENT_BNP_RATING,
t2.CURRENT_BUCKET,
t2.CURRENT_EXCHANGE_RATE,
t2.CURRENT_RATING,
t2.CURRENT_RATING_POINT,
t2.CURRENT_SCORE,
t2.DELAY_DATE,
t2.EFFECTIVE_INTEREST_RATE,
t2.GENERAL_LOAN_LIMIT,
t2.GENERAL_LOAN_MATURITY,
t2.GL_ACCOUNT_NUM,
t2.GROUP_CODE,
t2.GROUP_GENERAL_LOAN_LIMIT,
t2.GROUP_DESC,
t2.IFRS_RATING,
t2.INCEPTION_DATE,
t2.INCEPTION_EXCHANGE_RATE,
t2.INITIAL_BUCKET,
t2.INITIAL_IG_NG,
t2.INITIAL_RATING,
t2.INITIAL_RATING_DATE,
t2.INITIAL_RATING_POINT,
t2.INTEREST_CALC_TYP_ORJ,
t2.KTRNO,
t2.INITIAL_SCORE,
t2.LOAN_CUSTOMER_STATUS,
t2.LOAN_TP_ID,
t2.LGD_RATE,
t2.MAIN_BRANCH_ID,
t2.MIS_PRODUCTMAPNO,
t2.NACE_CODE,
t2.NACE_NAME,
t2.OPEN_EXCH_RATE,
t2.PROV_DELAY,
t2.RATING_ANALYSIS_NO,
t2.RATING_CLASS,
t2.RATING_DNM_GRP,
t2.RATING_DATE,
t2.RATING_GRP,
t2.REDISCOUNT_TL,
t2.RMPM_ID,
t2.RR_DISCOUNTED,
t2.SCORE_CARD_ID,
t2.STANDART_CCF,
t2.TREEFAMILYNO,
t2.TREELINENO,
t2.TREESINGLENO,
t3.EAD_STAGE1_TOTAL,
t3.EL_1Y_BALANCE,
t3.EL_1Y_REDISC,
t3.EL_1Y_TOTAL,
t3.EL_LIFETIME_REDISC,
t3.EL_LIFETIME_WITH_REDISC,
t3.EL_LIFETIME_WITHOUT_REDISC,
t3.REMAINING_YEAR,
t2.IFRS_RATING,
t2.AVG_RATING,
t2.SCORE_PROV_DELAY,
t3.BASE_1Y_BALANCE,
t3.FAV_1Y_BALANCE,
t3.ADV_1Y_BALANCE,
t3.BASE_1Y_REDISC,
t3.FAV_1Y_REDISC ,
t3.ADV_1Y_REDISC,
t3.BASE_1Y_TOTAL,
t3.FAV_1Y_TOTAL,
t3.ADV_1Y_TOTAL,
t2.POINT_DIFF,
t2.FORBORNE_FLAG,
t2.RR_ASSESSMENT_TYPE,
t2.STAGING,
t2.STAGING_DETAIL,
t2.STAGING_REASON,
t1.CORRECTEDPD1,
t1.CORRECTEDPD2,
t1.CORRECTEDPD3,
t1.CORRECTEDPD4,
t1.CORRECTEDPD5,
t1.CORRECTEDPD6,
t1.CORRECTEDPD7,
t1.CORRECTEDPD8,
t1.CORRECTEDPD9,
t1.CORRECTEDPD10,
t1.DISCOUNT1,
t1.DISCOUNT2,
t1.DISCOUNT3,
t1.DISCOUNT4,
t1.DISCOUNT5,
t1.DISCOUNT6,
t1.DISCOUNT7,
t1.DISCOUNT8,
t1.DISCOUNT9,
t1.DISCOUNT10,
t1.PDRATE1,
t1.PDRATE2,
t1.PDRATE3,
t1.PDRATE4,
t1.PDRATE5,
t1.PDRATE6,
t1.PDRATE7,
t1.PDRATE8,
t1.PDRATE9,
t1.PDRATE10,
t1.PDM1,
t1.PDM2,
t1.PDM3,
t1.PDM4,
t1.PDM5,
t1.PDM6,
t1.PDM7,
t1.PDM8,
t1.PDM9,
t1.PDM10,
t1.EL1,
t1.EL2,
t1.EL3,
t1.EL4,
t1.EL5,
t1.EL6,
t1.EL7,
t1.EL8,
t1.EL9,
t1.EL10,
t1.CUM_R_EL1,
t1.CUM_R_EL2,
t1.CUM_R_EL3,
t1.CUM_R_EL4,
t1.CUM_R_EL5,
t1.CUM_R_EL6,
t1.CUM_R_EL7,
t1.CUM_R_EL8,
t1.CUM_R_EL9,
t1.CUM_R_EL10,
t1.CUM_EL1,
t1.CUM_EL2,
t1.CUM_EL3,
t1.CUM_EL4,
t1.CUM_EL5,
t1.CUM_EL6,
t1.CUM_EL7,
t1.CUM_EL8,
t1.CUM_EL9,
t1.CUM_EL10,
T1.EL_BDDK
/*,*/
/* t1.EL_LIFETIME_PARTIAL*/
/* t1. IFRS9_FINAL_STAGE ,*/
/* t1.IFRS9_FINAL_EL*/
FROM SENARYO_FILTRE_4 t1
LEFT JOIN sas_port.Portfolio_%left(&mcr_part_id._v2) t2 ON (t1.InstID = t2.INSTID)
/* LEFT JOIN SAS_PORT.PORTFOLIO_201905_V2 t2 ON (t1.InstID = t2.INSTID)*/
LEFT JOIN SAS_RSLT.ECL_ALLOCATION_DETAIL t3 ON (t1.InstID = t3.InstID);
QUIT;
%MEND SCENARIO_FORMAT;

%MACRO LOOP_SCENARIO_FORMAT;
%LET I=1;
%DO %WHILE(%SCAN(&scenario_name.,&I.,%STR( ))~=);
%SCENARIO_FORMAT(%SCAN(&scenario_name.,&I.,%STR( )),%SCAN(&scenario_short.,&I.,%STR( )));
%LET I=&I.+1;
%END;
%MEND LOOP_SCENARIO_FORMAT; ;
%LOOP_SCENARIO_FORMAT;

 

1 ACCEPTED SOLUTION

Accepted Solutions
s_lassen
Meteorite | Level 14

Start by making it readable. 

 

I started with the first two SQL steps,  they should be formatted decently, something like

 

PROC SQL;
  CREATE TABLE SENARYO_FILTRE AS SELECT DISTINCT
    t1.InstID,
    t1.SimulationTime,
    t1._CORRECTED_PD_,
    t1._EIR_ADJUSTED_,
    t1._PD_RATE_,
    t1._PDM_,
    t1.EL_LIFETIME_PARTIAL,
    t1.EL_LIFETIME_WITH_REDISC,
    t1.EL_LIFETIME_WITHOUT_REDISC,
    T1.EL_BDDK
  FROM 
    SAS_RSLT.ALLPRICE t1
  WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;

PROC SQL;
  CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT 
    t1.InstID,
    t1.SimulationTime,
    t1._CORRECTED_PD_,
    t1._EIR_ADJUSTED_,
    t1._PD_RATE_,
    t1._PDM_,
    t1.EL_LIFETIME_PARTIAL,
    t1.EL_LIFETIME_WITH_REDISC,
    t1.EL_LIFETIME_WITHOUT_REDISC,
    t2.EL_1Y_TOTAL,
    t2.IFRS_RATING,
    T1.EL_BDDK,
    t2.InstID AS key
  FROM  
    WORK.SENARYO_FILTRE t1 LEFT JOIN 
    SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
  ORDER BY 
    t1.InstID,
    t1.SimulationTime
  ;
QUIT;

Now that these two steps can actually be read, it is obvious that they could be made into a single step:

 

 

PROC SQL;
  CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT 
    AllPrice.InstID,
    AllPrice.SimulationTime,
    AllPrice._CORRECTED_PD_,
    AllPrice._EIR_ADJUSTED_,
    AllPrice._PD_RATE_,
    AllPrice._PDM_,
    AllPrice.EL_LIFETIME_PARTIAL,
    AllPrice.EL_LIFETIME_WITH_REDISC,
    AllPrice.EL_LIFETIME_WITHOUT_REDISC,
    Details.EL_1Y_TOTAL,
    Details.IFRS_RATING,
    AllPrice.EL_BDDK,
    Details.InstID AS key
  FROM  
    SAS_RSLT.ALLPRICE AllPrice LEFT JOIN 
    SAS_RSLT.ECL_ALLOCATION_DETAIL Details ON (AllPrice.InstID = Details.InstID)
  where
    AllPrice.AnalysisName = "&SCENARIO."
  ORDER BY 
    AllPrice.InstID,
    AllPrice.SimulationTime
  ;
QUIT;

Note that I replaced the original meaningless aliases (t1 and t2) with more meaningful names. In this case, it is almost unnecessary, as the code is relatively short, but if it gets a little bit more complicated, you cannot see the whole query on one page, and you have to scroll up and down to get the meaning of the different aliases (I have seen queries of several hundred lines, where aliases like a,b,c...,j were used, and then table "i" was read more than once, so the second version got the alias "ii" - completely unreadable).

 

 

The data step should also be formatted, but while formatting it I noticed that the same code was repeated over and over again. So what the actual data step boils down to is this:

 

data SENARYO_FILTRE_3;
  set SENARYO_FILTRE_2;
  array PDM[10];
  array CORRECTEDPD[10];
  array DISCOUNT[10];
  array PDRATE[10];
  array EL[10];
  array CUM_R_EL[10];
  array CUM_EL[10];
  if instid = key and SimulationTime=in(1,2,3,4,5,6,7,8,9,10) then do;
    CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
    PDRATE[SimulationTime]= _PD_RATE_;
    CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
    EL[SimulationTime]= EL_LIFETIME_PARTIAL;
    DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
    PDM[SimulationTime]= _PDM_;
    CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
    end;
run;

Which leaves the interesting question: what to do when SimulationTime is not one of the values 1 to 10? It seems that the record is output, but without any calculated values. As all the variables are summed later, it "does not matter" - except for performance, which appears to be what you are looking to improve. So you may put in an "ELSE DELETE" statement here, or just change the IF statement to a subsetting IF - but wait, you could also change that to a WHERE clause, even faster:

 

 

data SENARYO_FILTRE_3;
  set SENARYO_FILTRE_2;
  where instid = key and SimulationTime in(1,2,3,4,5,6,7,8,9,10);
  array PDM[10];
  array CORRECTEDPD[10];
  array DISCOUNT[10];
  array PDRATE[10];
  array EL[10];
  array CUM_R_EL[10];
  array CUM_EL[10];
  CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
  PDRATE[SimulationTime]= _PD_RATE_;
  CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
  EL[SimulationTime]= EL_LIFETIME_PARTIAL;
  DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
  PDM[SimulationTime]= _PDM_;
  CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
run;

 

Actually, it would have been even smarter to put that as a condition in the previous step. Or rather, don't use LEFT JOIN in the first step, as you carefully get rid of the data where there is no INSTID for the KEY.

 

The next step is a mess: very meticulous, but not formatted at all. The comments are seriously nonsensical and makes the code even harder to read.

 

It should be rewritten as a PROC SUMMARY, much shorter, and you can now skim the whole code at a single glance:

 

proc summary data=SENARYO_FILTRE_3 nway;
  class instid EL_BDDK;
  var PDM1-PMD10 
      CORRECTEDPD1-CORRECTEDPD10
      DISCOUNT1-DISCOUNT10
      PDRATE1-PDRATE10
      EL1-EL10
      CUM_R_EL1-CUM_R_EL10
      CUM_EL1-CUM_EL10
      ;
   output out=SAS_PORT.SENARYOFLTRFEB_&SCENARIO(drop=_:) sum=;
run;

When looking at your original code, a question springs to mind: what is the EL_BDDK variable doing? It is not summed, and it is not used for grouping! I assume that your original code went well because EL_BBDK is uniquely dependent on the key, meaning that it could have been included in the GROUP variables. So I put it in  as a CLASS variable in PROC SUMMARY.

 

The third step then again merges the output from the last step with a couple of other datasets, and creates a distinct table for each scenario.

 

And then you loop through the whole shebang again and again, once for each scenario. Why? I think it would be a lot faster to go through all the scenarios at once, using an IN clause in the first SQL step, and then use the scenario as a grouping variable at the end.

 

When it comes to performance, there is one question: How large are the other tables that you merge with in the end? if not too large, you should sort them and use a data step with a MERGE statement at the end. If necessary, it would, macro-wise, be very easy to split the output to several tables in a data step. This is probably where you would get a real performance saving.

 

 

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

What is the performance now? What do you need?

What is the purpose of the code?  

Why are you using SQL instead of a SAS procedure?

FerhatD
Calcite | Level 5

Hi again,
Actually it is a step of SAS IFRS9 and we're trying to reduce run time. It is all changeable so we can use the SAS procedure instead of SQL. Do you have any suggestions?

FerhatD
Calcite | Level 5
Hi again,

Actually it is a step of SAS IFRS9 and we're trying to reduce run time. It
is all changeable so we can use the SAS procedure instead of SQL. Do you
have any suggestions?
Shmuel
Garnet | Level 18

Next code is not tested. I'm not sure if it will run faster but obviously the code is shorter and indented to be easily read. You may try it and should compare results to the results of original run.

I suggest to compare run time of each of the steps and decide weather to use the sas step/proc or the original sql.

%let scenario_name=base adverse favorable;
%let scenario_short=BAS ADV FAV;
%PUT &scenario_name.;
%PUT &scenario_short.;

%include "/sasdata/sasinput/%left(&sst_current_project_id.)/ifrs9_00_init.sas";

%MACRO SCENARIO_FORMAT(SCENARIO,SHORT);
  proc sort data=SAS_RSLT.ALLPRICE(
             keep= InstID SimulationTime _CORRECTED_PD_ _EIR_ADJUSTED_
                   _PD_RATE_  _PDM_  EL_LIFETIME_PARTIAL  EL_LIFETIME_WITH_REDISC
				   EL_LIFETIME_WITHOUT_REDISC EL_BDDK 
             where=(AnalysisName = "&SCENARIO."))
			 out=SENARYO_FILTRE  NODUPKEY;
		 by InstID SimulationTime;
  run;
  
  proc sort data= SAS_RSLT.ECL_ALLOCATION_DETAIL;
        by InstID SimulationTime;
  run;
  
  data SENARYO_FILTRE_2;
  merge SENARYO_FILTRE (in=in1 keep=_all_)
        SAS_RSLT.ECL_ALLOCATION_DETAIL(in=in2 keep=EL_1Y_TOTAL IFRS_RATING );
    by InstId SimulationTime;
	   if in2 then key = instid;
   run;

  data SENARYO_FILTRE_3;
   set SENARYO_FILTRE_2;
		array PDM[10];
		array CORRECTEDPD[10];
		array DISCOUNT[10];
		array PDRATE[10];
		array EL[10];
		array CUM_R_EL[10];
		array CUM_EL[10];
		
	if instid = key and (1 le SimulationTime le 10) then do;
		CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
		PDRATE[SimulationTime]= _PD_RATE_;
		CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
		EL[SimulationTime]= EL_LIFETIME_PARTIAL;
		DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
		PDM[SimulationTime]= _PDM_;
		CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
	end;	
  run;

  proc summary data= SENARYO_FILTRE_3 nway noprint;
       by InstId;
	   ID EL_BDDK;
       var  PDM1-PDM10  CORRECTEDPD1-CORRECTEDPD10  DISCOUNT1-DISCOUNT10
            PDRATE1-PDRATE10  EL1-EL10  CUM_R_EL1-CUM_R_EL10 CUM_EL1-CUM_EL10
		;
	output out=SENARYO_FILTRE_4  sum=;
  end;
  
  data
  merge SENARYO_FILTRE_4(in=in1)
        sas_port.Portfolio_%left(&mcr_part_id._v2)
		SAS_RSLT.ECL_ALLOCATION_DETAIL t3
	;
	by  InstID;
	    if in1;
  run;
%MEND SCENARIO_FORMAT;

%MACRO LOOP_SCENARIO_FORMAT;
	%LET I=1;
	%DO %WHILE(%SCAN(&scenario_name.,&I.,%STR( ))~=);
		%SCENARIO_FORMAT(%SCAN(&scenario_name.,&I.,%STR( )),%SCAN(&scenario_short.,&I.,%STR( )));
		%LET I=&I.+1;
	%END;
%MEND LOOP_SCENARIO_FORMAT; ;
%LOOP_SCENARIO_FORMAT;

   
        
            			
	   

 

SASKiwi
PROC Star

@FerhatD  - you should post a SAS log of your program including the run times for each step. That will tell us which program steps are slow and need tuning.

s_lassen
Meteorite | Level 14

Start by making it readable. 

 

I started with the first two SQL steps,  they should be formatted decently, something like

 

PROC SQL;
  CREATE TABLE SENARYO_FILTRE AS SELECT DISTINCT
    t1.InstID,
    t1.SimulationTime,
    t1._CORRECTED_PD_,
    t1._EIR_ADJUSTED_,
    t1._PD_RATE_,
    t1._PDM_,
    t1.EL_LIFETIME_PARTIAL,
    t1.EL_LIFETIME_WITH_REDISC,
    t1.EL_LIFETIME_WITHOUT_REDISC,
    T1.EL_BDDK
  FROM 
    SAS_RSLT.ALLPRICE t1
  WHERE t1.AnalysisName = "&SCENARIO.";
QUIT;

PROC SQL;
  CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT 
    t1.InstID,
    t1.SimulationTime,
    t1._CORRECTED_PD_,
    t1._EIR_ADJUSTED_,
    t1._PD_RATE_,
    t1._PDM_,
    t1.EL_LIFETIME_PARTIAL,
    t1.EL_LIFETIME_WITH_REDISC,
    t1.EL_LIFETIME_WITHOUT_REDISC,
    t2.EL_1Y_TOTAL,
    t2.IFRS_RATING,
    T1.EL_BDDK,
    t2.InstID AS key
  FROM  
    WORK.SENARYO_FILTRE t1 LEFT JOIN 
    SAS_RSLT.ECL_ALLOCATION_DETAIL t2 ON (t1.InstID = t2.InstID)
  ORDER BY 
    t1.InstID,
    t1.SimulationTime
  ;
QUIT;

Now that these two steps can actually be read, it is obvious that they could be made into a single step:

 

 

PROC SQL;
  CREATE TABLE SENARYO_FILTRE_2 AS SELECT DISTINCT 
    AllPrice.InstID,
    AllPrice.SimulationTime,
    AllPrice._CORRECTED_PD_,
    AllPrice._EIR_ADJUSTED_,
    AllPrice._PD_RATE_,
    AllPrice._PDM_,
    AllPrice.EL_LIFETIME_PARTIAL,
    AllPrice.EL_LIFETIME_WITH_REDISC,
    AllPrice.EL_LIFETIME_WITHOUT_REDISC,
    Details.EL_1Y_TOTAL,
    Details.IFRS_RATING,
    AllPrice.EL_BDDK,
    Details.InstID AS key
  FROM  
    SAS_RSLT.ALLPRICE AllPrice LEFT JOIN 
    SAS_RSLT.ECL_ALLOCATION_DETAIL Details ON (AllPrice.InstID = Details.InstID)
  where
    AllPrice.AnalysisName = "&SCENARIO."
  ORDER BY 
    AllPrice.InstID,
    AllPrice.SimulationTime
  ;
QUIT;

Note that I replaced the original meaningless aliases (t1 and t2) with more meaningful names. In this case, it is almost unnecessary, as the code is relatively short, but if it gets a little bit more complicated, you cannot see the whole query on one page, and you have to scroll up and down to get the meaning of the different aliases (I have seen queries of several hundred lines, where aliases like a,b,c...,j were used, and then table "i" was read more than once, so the second version got the alias "ii" - completely unreadable).

 

 

The data step should also be formatted, but while formatting it I noticed that the same code was repeated over and over again. So what the actual data step boils down to is this:

 

data SENARYO_FILTRE_3;
  set SENARYO_FILTRE_2;
  array PDM[10];
  array CORRECTEDPD[10];
  array DISCOUNT[10];
  array PDRATE[10];
  array EL[10];
  array CUM_R_EL[10];
  array CUM_EL[10];
  if instid = key and SimulationTime=in(1,2,3,4,5,6,7,8,9,10) then do;
    CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
    PDRATE[SimulationTime]= _PD_RATE_;
    CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
    EL[SimulationTime]= EL_LIFETIME_PARTIAL;
    DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
    PDM[SimulationTime]= _PDM_;
    CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
    end;
run;

Which leaves the interesting question: what to do when SimulationTime is not one of the values 1 to 10? It seems that the record is output, but without any calculated values. As all the variables are summed later, it "does not matter" - except for performance, which appears to be what you are looking to improve. So you may put in an "ELSE DELETE" statement here, or just change the IF statement to a subsetting IF - but wait, you could also change that to a WHERE clause, even faster:

 

 

data SENARYO_FILTRE_3;
  set SENARYO_FILTRE_2;
  where instid = key and SimulationTime in(1,2,3,4,5,6,7,8,9,10);
  array PDM[10];
  array CORRECTEDPD[10];
  array DISCOUNT[10];
  array PDRATE[10];
  array EL[10];
  array CUM_R_EL[10];
  array CUM_EL[10];
  CUM_R_EL[SimulationTime]= EL_LIFETIME_WITH_REDISC;
  PDRATE[SimulationTime]= _PD_RATE_;
  CUM_EL[SimulationTime]= EL_LIFETIME_WITHOUT_REDISC;
  EL[SimulationTime]= EL_LIFETIME_PARTIAL;
  DISCOUNT[SimulationTime]= _EIR_ADJUSTED_;
  PDM[SimulationTime]= _PDM_;
  CORRECTEDPD[SimulationTime]= _CORRECTED_PD_;
run;

 

Actually, it would have been even smarter to put that as a condition in the previous step. Or rather, don't use LEFT JOIN in the first step, as you carefully get rid of the data where there is no INSTID for the KEY.

 

The next step is a mess: very meticulous, but not formatted at all. The comments are seriously nonsensical and makes the code even harder to read.

 

It should be rewritten as a PROC SUMMARY, much shorter, and you can now skim the whole code at a single glance:

 

proc summary data=SENARYO_FILTRE_3 nway;
  class instid EL_BDDK;
  var PDM1-PMD10 
      CORRECTEDPD1-CORRECTEDPD10
      DISCOUNT1-DISCOUNT10
      PDRATE1-PDRATE10
      EL1-EL10
      CUM_R_EL1-CUM_R_EL10
      CUM_EL1-CUM_EL10
      ;
   output out=SAS_PORT.SENARYOFLTRFEB_&SCENARIO(drop=_:) sum=;
run;

When looking at your original code, a question springs to mind: what is the EL_BDDK variable doing? It is not summed, and it is not used for grouping! I assume that your original code went well because EL_BBDK is uniquely dependent on the key, meaning that it could have been included in the GROUP variables. So I put it in  as a CLASS variable in PROC SUMMARY.

 

The third step then again merges the output from the last step with a couple of other datasets, and creates a distinct table for each scenario.

 

And then you loop through the whole shebang again and again, once for each scenario. Why? I think it would be a lot faster to go through all the scenarios at once, using an IN clause in the first SQL step, and then use the scenario as a grouping variable at the end.

 

When it comes to performance, there is one question: How large are the other tables that you merge with in the end? if not too large, you should sort them and use a data step with a MERGE statement at the end. If necessary, it would, macro-wise, be very easy to split the output to several tables in a data step. This is probably where you would get a real performance saving.

 

 

FerhatD
Calcite | Level 5
Thanks for your detailed answer. I changed the code as you advised and it helped me to understand the logic and code runs in half the time. Thanks a lot.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 891 views
  • 2 likes
  • 5 in conversation