BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wheddingsjr
Pyrite | Level 9

Good morning all

 

I am not sure if this can be solved without data, but I have been tasked with trying to reduce the run time of the attached SAS code (highlighted in yellow). As you can see in the attached log, all of the coding processes quickly accept the yellow highlighted portion, which takes 45 mins to run. Is there anyway I can cut down the process time on that section of the coding? Any assistance would be greatly appreciated.

 

Thanks

 

William

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Looking at your log there is a big gap between real time and CPU time. This could be because there is a lot happening on the SQL Server side and SAS is just waiting or because a lot of time gets spent transferring the data from SQL Server to SAS.

 

Now looking at the code you've posted there are function in it which SAS can't push to the database and for this reason a lot of data will get transferred to SAS for processing ( functions intnx() and input() being the main "problem"). 

The code also combines Left Join On Clauses with Where Clauses. The Where Clause is likely going to post filter the result set from the Join. It would likely be more efficient to first filter the rows in the source tables to only use the rows for joining which are relevant.

 

Here my stab at your code. I had to do this "blind" so not very convinced that this will just work (syntax and logic) - but it hopefully will give you some ideas how you could performance tweak the code.

I'm still doing the Group By statement on the SAS side. This because there are variables in the Select clause which are not used for either grouping or aggregation - that's something SAS allows you to do but SQL Server would throw an error. Ideally you don't have such additional variables and can do everything on the SQL Server side.

OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;

/* INITIALIZE GLOBAL VARIABLESS */
%LET begDt= 01MAY2019;  /* reporting begin date; FORMAT= DDMMMYYYY*/
%LET endDt= %sysfunc(intnx(month, "&begDt"D , +11, e),date9.);  /* reporting end date: FORMAT= DDMMMYYYY*/

%LET begDt_prev=%sysfunc(intnx(month, "&begDt"D , -12, b),date9.);
%LET endDt_prev=%sysfunc(intnx(month, "&endDt"D , -12, e),date9.);

%let sqlsrv_begDt=%unquote(%nrbquote(')&begDt%nrbquote('));
%let sqlsrv_endDt=%unquote(%nrbquote(')&endDt%nrbquote('));
%let sqlsrv_begDt_prev=%unquote(%nrbquote(')&begDt_prev%nrbquote('));
%let sqlsrv_endDt_prev=%unquote(%nrbquote(')&endDt_prev%nrbquote('));

%put &=begDt;
%put &=endDt;
%put &=begDt_prev;
%put &=endDt_prev;

%put &=sqlsrv_begDt;
%put &=sqlsrv_endDt;
%put &=sqlsrv_begDt_prev;
%put &=sqlsrv_endDt_prev;

LIBNAME sqldb odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Finance readbuff=2000 read_lock_type=nolock;

PROC SQL;
  connect using sqldb;
  CREATE TABLE work.Datapull_epic AS
  select distinct * 
  from connection to sqldb
  (
    SELECT
      H.PatientMRN as MRN
      , H.HospitalAccountID as enc
      , H.PrimaryPayorID as PayorID
      , C.Subscriberid
      , C.PlanID
      , H.DISCHARGEDTS 
      , T.HospitalAccountClassDSC
      , PT.PatientFirstNM
      , PT.PatientLastNM
      , PT.BIRTHDTS
      , P.PayorNM
      , T.flag_exam /** any exam**/
      , T.flag_infu /** infusion**/
      , T.flag_chemo /** chemotherapy**/
      , T.flag_radOnc
      , T.flag_immuno /** immunotherapy */
      , H.period
      , SUM(T.Units) AS units
      , SUM(H.TotalChargeAMT) AS charges

    (
      select 
        PatientMRN as MRN
        , HospitalAccountID 
        , PrimaryPayorID 
        , DISCHARGEDTS
        , TotalChargeAMT
        , CASE 
          WHEN cast(dischargedts as date) 
            BETWEEN &sqlsrv_begDt_prev. AND &sqlsrv_endDt_prev. THEN 0 /* Claims data for year prior to report start date */
          WHEN cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. THEN 1
          ELSE 2 
          END 
          AS period
      FROM Fin.HospitalAccount 
      where 
        cast(dischargedts as date)  
          between &sqlsrv_begDt_prev. and &sqlsrv_endDt_prev. /* extract claims from 1 year prior to report start date until report end date */
        AND PatientMRN IN 
              (SELECT DISTINCT A.PatientMRN
               FROM Fin.HospitalAccount A
               WHERE cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. 
              ) /* for mrn in cohort 1 period*/
    ) H

    LEFT JOIN Fin.Coverage C 
      ON H.CoverageID = C.CoverageID

    LEFT JOIN Ref.Payor P 
      ON P.PayorID = C.PayorID

    LEFT JOIN 
      ( select
          T.HospitalAccountClassDSC
          , CASE 
              WHEN T.hcpcs IN (&examCPT) THEN 1 
              ELSE 0 
            END 
            AS flag_exam /** any exam**/
          , CASE WHEN T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT) THEN 1 ELSE 0 END AS flag_infu /** infusion**/
          , CASE WHEN T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT) THEN 1 ELSE 0 END AS flag_chemo /** chemotherapy**/
          , CASE 
              WHEN T.hcpcs IN (&radOncCPT) THEN 1 
              ELSE 0 
            END 
            AS flag_radOnc
          , CASE 
              WHEN T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492') THEN 1 
              ELSE 0 
            END 
            AS flag_immuno /** immunotherapy */
          , CASE 
                WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0 
                ELSE INT(T.TransactionCNT) 
              END
              AS units
          from Fin.HospitalTransaction T 
          where 
            T.HospitalAccountClassDSC = 'Outpatient'  /* outpatient only */
            AND T.hcpcs NE ''
        ) T
      ON H.HospitalAccountID = T.HospitalAccountID

    LEFT JOIN Pext.PatientExtension PT 
      ON H.PATIENTMRN = PT.DFCIMRN
  )
  GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
  ORDER BY H.PatientMRN , period DESC
  ;
QUIT;

 

View solution in original post

14 REPLIES 14
mklangley
Lapis Lazuli | Level 10

That's a complicated PROC SQL statement: using DISTINCT for a long list of variables, multiple CASE WHEN statements, four LEFT JOINs with datasets from different SQL servers, and a nested SELECT within the WHERE clause.

 

I don't know the size or scope of the datasets you're using, but here are a couple suggestions to try:

  1. First, insource the needed data into WORK before doing the LEFT JOINs. Then use these in the LEFT JOINs. That should perform better.
  2. Determine whether you need DISTINCTs or not. See if these could be added to the insourcing in #1.

I've attached a (rough) attempt at that. It likely needs some editing, but hopefully the idea makes sense.

 

Also, see @Kurt_Bremser's reply to a related question: https://communities.sas.com/t5/SAS-Programming/SAS-SQL-Code-Efficiency-Left-Join/m-p/399490#M96766

 

I hope that helps!

wheddingsjr
Pyrite | Level 9
Thanks MK, sorry for just responding but was away for a few days. I will give that a try and see how it works.
smantha
Lapis Lazuli | Level 10

1. There are four left joins and left join is not one of the most ideal way to do joins. See if you can filter the data from the biggest join and then use that reduced set to do the other joins. You can also create four separate sets first and then do the joins

2. There are lot of case when statements so see if you can split that step.

3. In the end there is a group by and an order by

 

All the above is causing the data to run for a long time.

 

 

wheddingsjr
Pyrite | Level 9
Thx Smantha. I inherited this code. Someone else wrote and used it for years from what I understand. I get hired and one of the first things I get tasked with is trying to streamline it (as if no one else could do it before I was brought onboard).
ballardw
Super User

You can replace CASE WHEN ELSE for 0/1 coding such as you show with

 CASE WHENT.hcpcs IN(&radOncCPT) THEN1ELSE0END ASflag_radOnc

(the stupid spacing is because of copy from Docx with another viewer, Code belongs in code boxes on the forum opened with the </> or "running man" icon and paste from the editor or log directly. DOCX and similar files will sometimes change characters such as quotes to non-programming versions)

with

( t.hcpcs in(&radOncCPT) ) as Flag_radOnc

SAS will return 1 for true and 0 for false for logical comparisons.

wheddingsjr
Pyrite | Level 9
Thanks Ballard, I will try this as well as some of the other suggestions.
Kurt_Bremser
Super User

Please post logs by copy/pasting into a window opened with this button:

Bildschirmfoto 2020-04-07 um 08.32.59.jpg

for code, use the "little running man" right next to it.

Only post the step in question, this should suffice.

 

Is the DISTINCT in the SQL really necessary? And have you checked that the GROUP BY is complete, because you have an automatic remerge?

wheddingsjr
Pyrite | Level 9
Thanks Kurt for the heads up about posting code. I started to just post the particular part in question but wanted to show exactly what it was doing. I asked abut the "DISTINCT" and was told it was necessary and I have absolutely no idea what a automatic remerge is LOL. (I am a novice at this).
Kurt_Bremser
Super User

In standard SQL, you usually cannot do this:

select name, sex, avg(weight)
from class
group by sex

because all columns must either be part of the GROUP BY or the result of a summary function. SAS SQL, OTOH, allows this; it calculates the summary functions and then merges the result back into the whole dataset.

Since this is often not what was intended, a NOTE about remerging is usually an alarm signal that the GROUP BY statement is somehow wrong, e.g. someone added a variable to the SELECT and forgot to add it to the GROUP BY. Or someone used position numbers in the GROUP BY (as is the case with your code), and moving or adding a column shifted everything.

AhmedAl_Attar
Rhodochrosite | Level 12

@wheddingsjr 

It seems like your query is using SAS functions which may not be recognized by the SQL Server!!

	, CASE WHEN datepart(input(dischargedts,anydtdtm.)) BETWEEN intnx('year', "&begDt"D , -1, 'same') AND intnx('day', "&begDt"D , -1, 'same') THEN 0 /* Claims data for year prior to report start date */
     	WHEN datepart(input(dischargedts,anydtdtm.)) BETWEEN "&begDt"D AND "&endDt"D THEN 1
	 	ELSE 2 END AS period

DATEPART, INTNX are SAS functions!! This causes the data to transferred to SAS before it is processed!!! 

Use the following option to see what's going on

OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

Hope this helps you

Ahmed

 

AhmedAl_Attar
Rhodochrosite | Level 12

@wheddingsjr 

 

Try these statements and query, and see if helps you


OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;

/* Calculate required date macros */
%let priorYrDt  = %SYSFUNC(PUTN(%SYSFUNC(INTNX(year, "&begDt"d ,-1, same)),date9.));
%let priorDayDt = %SYSFUNC(PUTN(%SYSFUNC(INTNX(day,  "&begDt"D ,-1, same)),date9.));
%put &=priorYrDt &=priorDayDt;

PROC SQL;
	CREATE TABLE work.Datapull_epic AS
	SELECT DISTINCT
	  H.PatientMRN as MRN
	, H.HospitalAccountID as enc
	, H.PrimaryPayorID as PayorID
	, C.Subscriberid
	, C.PlanID
	/*
	, DATEPART(H.DischargeDTS)
	, MAX(H.DISCHARGEDTS) AS LAST_AGENT_DT
	*/
	, H.DISCHARGEDTS 
	, T.HospitalAccountClassDSC
	/*
	, PT.PatientNM */ /* Used PatientFirstNM and PatientLastNM from SharedLocal YM0214 
	*/
	, PT.PatientFirstNM
	, PT.PatientLastNM
	, PT.BIRTHDTS
	, P.PayorNM
	, (T.hcpcs IN (&examCPT)) AS flag_exam /* any exam*/
	, (T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT)) AS flag_infu /* infusion*/
	, (T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT)) AS flag_chemo /* chemotherapy*/
	, (T.hcpcs IN (&radOncCPT)) AS flag_radOnc
	, (T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492')) AS flag_immuno /* immunotherapy */
	, CASE	WHEN dischargedts BETWEEN "&priorYrDt"D AND "&priorDayDt"D THEN 0 /* Claims data for year prior to report start date */
			WHEN dischargedts BETWEEN "&begDt"D AND "&endDt"D THEN 1
			ELSE 2 
	  END AS period
	, SUM(CASE WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0 ELSE INT(T.TransactionCNT) END) AS units
	, SUM(H.TotalChargeAMT) AS charges

	FROM Fin.HospitalAccount H
	LEFT JOIN Fin.Coverage C ON H.CoverageID = C.CoverageID
	LEFT JOIN Ref.Payor P ON P.PayorID = C.PayorID
	LEFT JOIN Fin.HospitalTransaction T ON H.HospitalAccountID = T.HospitalAccountID
	/*INNER JOIN Pat.PATIENT PT ON PT.MRN=H.PATIENTMRN*/ /* YM0214 - changed to left join SharedLocal.Person.PatientExtension below */
	LEFT JOIN Pext.PatientExtension PT ON H.PATIENTMRN = PT.DFCIMRN

	WHERE  
	dischargedts BETWEEN "&priorYrDt"D AND "&endDt"D /* extract claims from 1 year prior to report start date until report end date */
	AND H.PatientMRN IN 
		(SELECT DISTINCT A.PatientMRN
			FROM Fin.HospitalAccount A
			WHERE dischargedts BETWEEN "&begDt"D AND "&endDt"D 
			AND A.PatientMRN NE '') /* for mrn in cohort 1 period*/
	AND T.HospitalAccountClassDSC = 'Outpatient'	/* outpatient only */
	AND ((T.hcpcs NE '') OR H.PatientMRN NE '')

	GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
	ORDER BY H.PatientMRN, period DESC;
QUIT;

Good luck,

Ahmed

 

wheddingsjr
Pyrite | Level 9
Thanks Ahmed, I will give it a shout.
Patrick
Opal | Level 21

Looking at your log there is a big gap between real time and CPU time. This could be because there is a lot happening on the SQL Server side and SAS is just waiting or because a lot of time gets spent transferring the data from SQL Server to SAS.

 

Now looking at the code you've posted there are function in it which SAS can't push to the database and for this reason a lot of data will get transferred to SAS for processing ( functions intnx() and input() being the main "problem"). 

The code also combines Left Join On Clauses with Where Clauses. The Where Clause is likely going to post filter the result set from the Join. It would likely be more efficient to first filter the rows in the source tables to only use the rows for joining which are relevant.

 

Here my stab at your code. I had to do this "blind" so not very convinced that this will just work (syntax and logic) - but it hopefully will give you some ideas how you could performance tweak the code.

I'm still doing the Group By statement on the SAS side. This because there are variables in the Select clause which are not used for either grouping or aggregation - that's something SAS allows you to do but SQL Server would throw an error. Ideally you don't have such additional variables and can do everything on the SQL Server side.

OPTIONS DEBUG=DBMS_SELECT SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX FULLSTIMER;

/* INITIALIZE GLOBAL VARIABLESS */
%LET begDt= 01MAY2019;  /* reporting begin date; FORMAT= DDMMMYYYY*/
%LET endDt= %sysfunc(intnx(month, "&begDt"D , +11, e),date9.);  /* reporting end date: FORMAT= DDMMMYYYY*/

%LET begDt_prev=%sysfunc(intnx(month, "&begDt"D , -12, b),date9.);
%LET endDt_prev=%sysfunc(intnx(month, "&endDt"D , -12, e),date9.);

%let sqlsrv_begDt=%unquote(%nrbquote(')&begDt%nrbquote('));
%let sqlsrv_endDt=%unquote(%nrbquote(')&endDt%nrbquote('));
%let sqlsrv_begDt_prev=%unquote(%nrbquote(')&begDt_prev%nrbquote('));
%let sqlsrv_endDt_prev=%unquote(%nrbquote(')&endDt_prev%nrbquote('));

%put &=begDt;
%put &=endDt;
%put &=begDt_prev;
%put &=endDt_prev;

%put &=sqlsrv_begDt;
%put &=sqlsrv_endDt;
%put &=sqlsrv_begDt_prev;
%put &=sqlsrv_endDt_prev;

LIBNAME sqldb odbc complete="driver=SQL Server; database=Epic; server=PHSSQL2195" schema=Finance readbuff=2000 read_lock_type=nolock;

PROC SQL;
  connect using sqldb;
  CREATE TABLE work.Datapull_epic AS
  select distinct * 
  from connection to sqldb
  (
    SELECT
      H.PatientMRN as MRN
      , H.HospitalAccountID as enc
      , H.PrimaryPayorID as PayorID
      , C.Subscriberid
      , C.PlanID
      , H.DISCHARGEDTS 
      , T.HospitalAccountClassDSC
      , PT.PatientFirstNM
      , PT.PatientLastNM
      , PT.BIRTHDTS
      , P.PayorNM
      , T.flag_exam /** any exam**/
      , T.flag_infu /** infusion**/
      , T.flag_chemo /** chemotherapy**/
      , T.flag_radOnc
      , T.flag_immuno /** immunotherapy */
      , H.period
      , SUM(T.Units) AS units
      , SUM(H.TotalChargeAMT) AS charges

    (
      select 
        PatientMRN as MRN
        , HospitalAccountID 
        , PrimaryPayorID 
        , DISCHARGEDTS
        , TotalChargeAMT
        , CASE 
          WHEN cast(dischargedts as date) 
            BETWEEN &sqlsrv_begDt_prev. AND &sqlsrv_endDt_prev. THEN 0 /* Claims data for year prior to report start date */
          WHEN cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. THEN 1
          ELSE 2 
          END 
          AS period
      FROM Fin.HospitalAccount 
      where 
        cast(dischargedts as date)  
          between &sqlsrv_begDt_prev. and &sqlsrv_endDt_prev. /* extract claims from 1 year prior to report start date until report end date */
        AND PatientMRN IN 
              (SELECT DISTINCT A.PatientMRN
               FROM Fin.HospitalAccount A
               WHERE cast(dischargedts as date) BETWEEN &sqlsrv_begDt. AND &sqlsrv_endDt. 
              ) /* for mrn in cohort 1 period*/
    ) H

    LEFT JOIN Fin.Coverage C 
      ON H.CoverageID = C.CoverageID

    LEFT JOIN Ref.Payor P 
      ON P.PayorID = C.PayorID

    LEFT JOIN 
      ( select
          T.HospitalAccountClassDSC
          , CASE 
              WHEN T.hcpcs IN (&examCPT) THEN 1 
              ELSE 0 
            END 
            AS flag_exam /** any exam**/
          , CASE WHEN T.hcpcs IN (&infuCPT) OR T.CPT IN (&infuCPT) THEN 1 ELSE 0 END AS flag_infu /** infusion**/
          , CASE WHEN T.hcpcs IN (&chemoCPT) OR T.CPT IN (&chemoCPT) THEN 1 ELSE 0 END AS flag_chemo /** chemotherapy**/
          , CASE 
              WHEN T.hcpcs IN (&radOncCPT) THEN 1 
              ELSE 0 
            END 
            AS flag_radOnc
          , CASE 
              WHEN T.hcpcs IN ('C9027','J9271','C9453','J9299','C9284','J9228','C9399','C9483','J9022','C9491','J9023','C9492') THEN 1 
              ELSE 0 
            END 
            AS flag_immuno /** immunotherapy */
          , CASE 
                WHEN T.ChargeModifierListTXT LIKE '%DFP%' THEN 0 
                ELSE INT(T.TransactionCNT) 
              END
              AS units
          from Fin.HospitalTransaction T 
          where 
            T.HospitalAccountClassDSC = 'Outpatient'  /* outpatient only */
            AND T.hcpcs NE ''
        ) T
      ON H.HospitalAccountID = T.HospitalAccountID

    LEFT JOIN Pext.PatientExtension PT 
      ON H.PATIENTMRN = PT.DFCIMRN
  )
  GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
  ORDER BY H.PatientMRN , period DESC
  ;
QUIT;

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2190 views
  • 3 likes
  • 7 in conversation