Good morning all
I am not sure if this can be solved without data, but I have been tasked with trying to reduce the run time of the attached SAS code (highlighted in yellow). As you can see in the attached log, all of the coding processes quickly accept the yellow highlighted portion, which takes 45 mins to run. Is there anyway I can cut down the process time on that section of the coding? Any assistance would be greatly appreciated.
Thanks
William
Looking at your log there is a big gap between real time and CPU time. This could be because there is a lot happening on the SQL Server side and SAS is just waiting or because a lot of time gets spent transferring the data from SQL Server to SAS.
Now looking at the code you've posted there are function in it which SAS can't push to the database and for this reason a lot of data will get transferred to SAS for processing ( functions intnx() and input() being the main "problem").
The code also combines Left Join On Clauses with Where Clauses. The Where Clause is likely going to post filter the result set from the Join. It would likely be more efficient to first filter the rows in the source tables to only use the rows for joining which are relevant.
Here my stab at your code. I had to do this "blind" so not very convinced that this will just work (syntax and logic) - but it hopefully will give you some ideas how you could performance tweak the code.
I'm still doing the Group By statement on the SAS side. This because there are variables in the Select clause which are not used for either grouping or aggregation - that's something SAS allows you to do but SQL Server would throw an error. Ideally you don't have such additional variables and can do everything on the SQL Server side.
OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;
/* INITIALIZE GLOBAL VARIABLESS */
%LET begDt= 01MAY2019; /* reporting begin date; FORMAT= DDMMMYYYY*/
%LET endDt= %sysfunc(intnx(month, "&begDt"D , +11, e),date9.); /* reporting end date: FORMAT= DDMMMYYYY*/
%LET begDt_prev=%sysfunc(intnx(month, "&begDt"D , -12, b),date9.);
%LET endDt_prev=%sysfunc(intnx(month, "&endDt"D , -12, e),date9.);
%let sqlsrv_begDt=%unquote(%nrbquote(')&begDt%nrbquote('));
%let sqlsrv_endDt=%unquote(%nrbquote(')&endDt%nrbquote('));
%let sqlsrv_begDt_prev=%unquote(%nrbquote(')&begDt_prev%nrbquote('));
%let sqlsrv_endDt_prev=%unquote(%nrbquote(')&endDt_prev%nrbquote('));
%put &=begDt;
%put &=endDt;
%put &=begDt_prev;
%put &=endDt_prev;
%put &=sqlsrv_begDt;
%put &=sqlsrv_endDt;
%put &=sqlsrv_begDt_prev;
%put &=sqlsrv_endDt_prev;
LIBNAME sqldb odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Finance readbuff=2000 read_lock_type=nolock;
PROC SQL;
connect using sqldb;
CREATE TABLE work.Datapull_epic AS
select distinct *
from connection to sqldb
(
SELECT
H.PatientMRN as MRN
, H.HospitalAccountID as enc
, H.PrimaryPayorID as PayorID
, C.Subscriberid
, C.PlanID
, H.DISCHARGEDTS
, T.HospitalAccountClassDSC
, PT.PatientFirstNM
, PT.PatientLastNM
, PT.BIRTHDTS
, P.PayorNM
, T.flag_exam /** any exam**/
, T.flag_infu /** infusion**/
, T.flag_chemo /** chemotherapy**/
, T.flag_radOnc
, T.flag_immuno /** immunotherapy */
, H.period
, SUM(T.Units) AS units
, SUM(H.TotalChargeAMT) AS charges
(
select
PatientMRN as MRN
, HospitalAccountID
, PrimaryPayorID
, DISCHARGEDTS
, TotalChargeAMT
, CASE
WHEN cast(dischargedts as date)
BETWEEN &sqlsrv_begDt_prev. AND &sqlsrv_endDt_prev. THEN 0 /* Claims data for year prior to report start date */
WHEN cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. THEN 1
ELSE 2
END
AS period
FROM Fin.HospitalAccount
where
cast(dischargedts as date)
between &sqlsrv_begDt_prev. and &sqlsrv_endDt_prev. /* extract claims from 1 year prior to report start date until report end date */
AND PatientMRN IN
(SELECT DISTINCT A.PatientMRN
FROM Fin.HospitalAccount A
WHERE cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt.
) /* for mrn in cohort 1 period*/
) H
LEFT JOIN Fin.Coverage C
ON H.CoverageID = C.CoverageID
LEFT JOIN Ref.Payor P
ON P.PayorID = C.PayorID
LEFT JOIN
( select
T.HospitalAccountClassDSC
, CASE
WHEN T.hcpcs IN (&examCPT) THEN 1
ELSE 0
END
AS flag_exam /** any exam**/
, CASE WHEN T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT) THEN 1 ELSE 0 END AS flag_infu /** infusion**/
, CASE WHEN T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT) THEN 1 ELSE 0 END AS flag_chemo /** chemotherapy**/
, CASE
WHEN T.hcpcs IN (&radOncCPT) THEN 1
ELSE 0
END
AS flag_radOnc
, CASE
WHEN T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492') THEN 1
ELSE 0
END
AS flag_immuno /** immunotherapy */
, CASE
WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0
ELSE INT(T.TransactionCNT)
END
AS units
from Fin.HospitalTransaction T
where
T.HospitalAccountClassDSC = 'Outpatient' /* outpatient only */
AND T.hcpcs NE ''
) T
ON H.HospitalAccountID = T.HospitalAccountID
LEFT JOIN Pext.PatientExtension PT
ON H.PATIENTMRN = PT.DFCIMRN
)
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
ORDER BY H.PatientMRN , period DESC
;
QUIT;
That's a complicated PROC SQL statement: using DISTINCT for a long list of variables, multiple CASE WHEN statements, four LEFT JOINs with datasets from different SQL servers, and a nested SELECT within the WHERE clause.
I don't know the size or scope of the datasets you're using, but here are a couple suggestions to try:
I've attached a (rough) attempt at that. It likely needs some editing, but hopefully the idea makes sense.
Also, see @Kurt_Bremser's reply to a related question: https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399490#M96766
I hope that helps!
1. There are four left joins and left join is not one of the most ideal way to do joins. See if you can filter the data from the biggest join and then use that reduced set to do the other joins. You can also create four separate sets first and then do the joins
2. There are lot of case when statements so see if you can split that step.
3. In the end there is a group by and an order by
All the above is causing the data to run for a long time.
You can replace CASE WHEN ELSE for 0/1 coding such as you show with
CASE WHENT.hcpcs IN(&radOncCPT) THEN1ELSE0END ASflag_radOnc
(the stupid spacing is because of copy from Docx with another viewer, Code belongs in code boxes on the forum opened with the </> or "running man" icon and paste from the editor or log directly. DOCX and similar files will sometimes change characters such as quotes to non-programming versions)
with
( t.hcpcs in(&radOncCPT) ) as Flag_radOnc
SAS will return 1 for true and 0 for false for logical comparisons.
Please post logs by copy/pasting into a window opened with this button:
for code, use the "little running man" right next to it.
Only post the step in question, this should suffice.
Is the DISTINCT in the SQL really necessary? And have you checked that the GROUP BY is complete, because you have an automatic remerge?
In standard SQL, you usually cannot do this:
select name, sex, avg(weight)
from class
group by sex
because all columns must either be part of the GROUP BY or the result of a summary function. SAS SQL, OTOH, allows this; it calculates the summary functions and then merges the result back into the whole dataset.
Since this is often not what was intended, a NOTE about remerging is usually an alarm signal that the GROUP BY statement is somehow wrong, e.g. someone added a variable to the SELECT and forgot to add it to the GROUP BY. Or someone used position numbers in the GROUP BY (as is the case with your code), and moving or adding a column shifted everything.
It seems like your query is using SAS functions which may not be recognized by the SQL Server!!
, CASE WHEN datepart(input(dischargedts,anydtdtm.)) BETWEEN intnx('year', "&begDt"D , -1, 'same') AND intnx('day', "&begDt"D , -1, 'same') THEN 0 /* Claims data for year prior to report start date */
WHEN datepart(input(dischargedts,anydtdtm.)) BETWEEN "&begDt"D AND "&endDt"D THEN 1
ELSE 2 END AS period
DATEPART, INTNX are SAS functions!! This causes the data to transferred to SAS before it is processed!!!
Use the following option to see what's going on
OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;
Hope this helps you
Ahmed
Try these statements and query, and see if helps you
OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;
/* Calculate required date macros */
%let priorYrDt = %SYSFUNC(PUTN(%SYSFUNC(INTNX(year, "&begDt"d ,-1, same)),date9.));
%let priorDayDt = %SYSFUNC(PUTN(%SYSFUNC(INTNX(day, "&begDt"D ,-1, same)),date9.));
%put &=priorYrDt &=priorDayDt;
PROC SQL;
CREATE TABLE work.Datapull_epic AS
SELECT DISTINCT
H.PatientMRN as MRN
, H.HospitalAccountID as enc
, H.PrimaryPayorID as PayorID
, C.Subscriberid
, C.PlanID
/*
, DATEPART(H.DischargeDTS)
, MAX(H.DISCHARGEDTS) AS LAST_AGENT_DT
*/
, H.DISCHARGEDTS
, T.HospitalAccountClassDSC
/*
, PT.PatientNM */ /* Used PatientFirstNM and PatientLastNM from SharedLocal YM0214
*/
, PT.PatientFirstNM
, PT.PatientLastNM
, PT.BIRTHDTS
, P.PayorNM
, (T.hcpcs IN (&examCPT)) AS flag_exam /* any exam*/
, (T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT)) AS flag_infu /* infusion*/
, (T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT)) AS flag_chemo /* chemotherapy*/
, (T.hcpcs IN (&radOncCPT)) AS flag_radOnc
, (T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492')) AS flag_immuno /* immunotherapy */
, CASE WHEN dischargedts BETWEEN "&priorYrDt"D AND "&priorDayDt"D THEN 0 /* Claims data for year prior to report start date */
WHEN dischargedts BETWEEN "&begDt"D AND "&endDt"D THEN 1
ELSE 2
END AS period
, SUM(CASE WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0 ELSE INT(T.TransactionCNT) END) AS units
, SUM(H.TotalChargeAMT) AS charges
FROM Fin.HospitalAccount H
LEFT JOIN Fin.Coverage C ON H.CoverageID = C.CoverageID
LEFT JOIN Ref.Payor P ON P.PayorID = C.PayorID
LEFT JOIN Fin.HospitalTransaction T ON H.HospitalAccountID = T.HospitalAccountID
/*INNER JOIN Pat.PATIENT PT ON PT.MRN=H.PATIENTMRN*/ /* YM0214 - changed to left join SharedLocal.Person.PatientExtension below */
LEFT JOIN Pext.PatientExtension PT ON H.PATIENTMRN = PT.DFCIMRN
WHERE
dischargedts BETWEEN "&priorYrDt"D AND "&endDt"D /* extract claims from 1 year prior to report start date until report end date */
AND H.PatientMRN IN
(SELECT DISTINCT A.PatientMRN
FROM Fin.HospitalAccount A
WHERE dischargedts BETWEEN "&begDt"D AND "&endDt"D
AND A.PatientMRN NE '') /* for mrn in cohort 1 period*/
AND T.HospitalAccountClassDSC = 'Outpatient' /* outpatient only */
AND ((T.hcpcs NE '') OR H.PatientMRN NE '')
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
ORDER BY H.PatientMRN, period DESC;
QUIT;
Good luck,
Ahmed
Looking at your log there is a big gap between real time and CPU time. This could be because there is a lot happening on the SQL Server side and SAS is just waiting or because a lot of time gets spent transferring the data from SQL Server to SAS.
Now looking at the code you've posted there are function in it which SAS can't push to the database and for this reason a lot of data will get transferred to SAS for processing ( functions intnx() and input() being the main "problem").
The code also combines Left Join On Clauses with Where Clauses. The Where Clause is likely going to post filter the result set from the Join. It would likely be more efficient to first filter the rows in the source tables to only use the rows for joining which are relevant.
Here my stab at your code. I had to do this "blind" so not very convinced that this will just work (syntax and logic) - but it hopefully will give you some ideas how you could performance tweak the code.
I'm still doing the Group By statement on the SAS side. This because there are variables in the Select clause which are not used for either grouping or aggregation - that's something SAS allows you to do but SQL Server would throw an error. Ideally you don't have such additional variables and can do everything on the SQL Server side.
OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;
/* INITIALIZE GLOBAL VARIABLESS */
%LET begDt= 01MAY2019; /* reporting begin date; FORMAT= DDMMMYYYY*/
%LET endDt= %sysfunc(intnx(month, "&begDt"D , +11, e),date9.); /* reporting end date: FORMAT= DDMMMYYYY*/
%LET begDt_prev=%sysfunc(intnx(month, "&begDt"D , -12, b),date9.);
%LET endDt_prev=%sysfunc(intnx(month, "&endDt"D , -12, e),date9.);
%let sqlsrv_begDt=%unquote(%nrbquote(')&begDt%nrbquote('));
%let sqlsrv_endDt=%unquote(%nrbquote(')&endDt%nrbquote('));
%let sqlsrv_begDt_prev=%unquote(%nrbquote(')&begDt_prev%nrbquote('));
%let sqlsrv_endDt_prev=%unquote(%nrbquote(')&endDt_prev%nrbquote('));
%put &=begDt;
%put &=endDt;
%put &=begDt_prev;
%put &=endDt_prev;
%put &=sqlsrv_begDt;
%put &=sqlsrv_endDt;
%put &=sqlsrv_begDt_prev;
%put &=sqlsrv_endDt_prev;
LIBNAME sqldb odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Finance readbuff=2000 read_lock_type=nolock;
PROC SQL;
connect using sqldb;
CREATE TABLE work.Datapull_epic AS
select distinct *
from connection to sqldb
(
SELECT
H.PatientMRN as MRN
, H.HospitalAccountID as enc
, H.PrimaryPayorID as PayorID
, C.Subscriberid
, C.PlanID
, H.DISCHARGEDTS
, T.HospitalAccountClassDSC
, PT.PatientFirstNM
, PT.PatientLastNM
, PT.BIRTHDTS
, P.PayorNM
, T.flag_exam /** any exam**/
, T.flag_infu /** infusion**/
, T.flag_chemo /** chemotherapy**/
, T.flag_radOnc
, T.flag_immuno /** immunotherapy */
, H.period
, SUM(T.Units) AS units
, SUM(H.TotalChargeAMT) AS charges
(
select
PatientMRN as MRN
, HospitalAccountID
, PrimaryPayorID
, DISCHARGEDTS
, TotalChargeAMT
, CASE
WHEN cast(dischargedts as date)
BETWEEN &sqlsrv_begDt_prev. AND &sqlsrv_endDt_prev. THEN 0 /* Claims data for year prior to report start date */
WHEN cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. THEN 1
ELSE 2
END
AS period
FROM Fin.HospitalAccount
where
cast(dischargedts as date)
between &sqlsrv_begDt_prev. and &sqlsrv_endDt_prev. /* extract claims from 1 year prior to report start date until report end date */
AND PatientMRN IN
(SELECT DISTINCT A.PatientMRN
FROM Fin.HospitalAccount A
WHERE cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt.
) /* for mrn in cohort 1 period*/
) H
LEFT JOIN Fin.Coverage C
ON H.CoverageID = C.CoverageID
LEFT JOIN Ref.Payor P
ON P.PayorID = C.PayorID
LEFT JOIN
( select
T.HospitalAccountClassDSC
, CASE
WHEN T.hcpcs IN (&examCPT) THEN 1
ELSE 0
END
AS flag_exam /** any exam**/
, CASE WHEN T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT) THEN 1 ELSE 0 END AS flag_infu /** infusion**/
, CASE WHEN T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT) THEN 1 ELSE 0 END AS flag_chemo /** chemotherapy**/
, CASE
WHEN T.hcpcs IN (&radOncCPT) THEN 1
ELSE 0
END
AS flag_radOnc
, CASE
WHEN T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492') THEN 1
ELSE 0
END
AS flag_immuno /** immunotherapy */
, CASE
WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0
ELSE INT(T.TransactionCNT)
END
AS units
from Fin.HospitalTransaction T
where
T.HospitalAccountClassDSC = 'Outpatient' /* outpatient only */
AND T.hcpcs NE ''
) T
ON H.HospitalAccountID = T.HospitalAccountID
LEFT JOIN Pext.PatientExtension PT
ON H.PATIENTMRN = PT.DFCIMRN
)
GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
ORDER BY H.PatientMRN , period DESC
;
QUIT;
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.