BookmarkSubscribeRSS Feed
Batman
Quartz | Level 8

I'm trying to get a unique patient count based on the presence of certain Diagnosis codes in multiple diagnosis code columns.   It has to be done in an sql query.  This code works, but I'm wondering if there is a more efficient method.

 

PROC SQL;

   SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK

      FROM MDCRD_AL.V2_FCT_CLM t1  /*CLAIMS FILE*/

           INNER JOIN WORK.ICD10 t2 ON  /* LIST OF DIAGNOSIS CODES*/

                                             (

                                             t1.CLM_DGNS_1_CD = t2.F1

                                             or t1.CLM_DGNS_2_CD = t2.F1

                                             or t1.CLM_DGNS_3_CD = t2.F1

                                             or t1.CLM_DGNS_4_CD = t2.F1

                                             )

      WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd

               AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y';

QUIT;

4 REPLIES 4
s_lassen
Meteorite | Level 14

You may get some performance enhancement by using an EXISTS query (as the inner join may be returning more than one row from the diagnosis table):

PROC SQL;
   SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
      FROM MDCRD_AL.V2_FCT_CLM t1  /*CLAIMS FILE*/
      WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
          AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y'
          and exists(select * from WORK.ICD10 t2 ON  /* LIST OF DIAGNOSIS CODES*/ where 
                              t1.CLM_DGNS_1_CD = t2.F1
                              or t1.CLM_DGNS_2_CD = t2.F1
                              or t1.CLM_DGNS_3_CD = t2.F1
                              or t1.CLM_DGNS_4_CD = t2.F1
                                             );

 QUIT;

With an EXISTS query, the program should return as soon as it finds the first matching row, and that may speed things up (with the inner join, SQL will insist on fetching all matching rows).

Batman
Quartz | Level 8

Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.

Kurt_Bremser
Super User

@Batman wrote:

Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.


Then you must look at non-SQL solutions. For the lookup, loading the wanted diagnosis codes into a hash object will outperform any SQL join.

Depending on the cardinality of geo_bene_sk, even the COUNT DISTINCT might be doable in a hash, without any previous sorting.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 565 views
  • 0 likes
  • 3 in conversation