BookmarkSubscribeRSS Feed
Batman
Quartz | Level 8

I'm trying to get a unique patient count based on the presence of certain Diagnosis codes in multiple diagnosis code columns.   It has to be done in an sql query.  This code works, but I'm wondering if there is a more efficient method.

 

PROC SQL;

   SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK

      FROM MDCRD_AL.V2_FCT_CLM t1  /*CLAIMS FILE*/

           INNER JOIN WORK.ICD10 t2 ON  /* LIST OF DIAGNOSIS CODES*/

                                             (

                                             t1.CLM_DGNS_1_CD = t2.F1

                                             or t1.CLM_DGNS_2_CD = t2.F1

                                             or t1.CLM_DGNS_3_CD = t2.F1

                                             or t1.CLM_DGNS_4_CD = t2.F1

                                             )

      WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd

               AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y';

QUIT;

4 REPLIES 4
s_lassen
Meteorite | Level 14

You may get some performance enhancement by using an EXISTS query (as the inner join may be returning more than one row from the diagnosis table):

PROC SQL;
   SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
      FROM MDCRD_AL.V2_FCT_CLM t1  /*CLAIMS FILE*/
      WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
          AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y'
          and exists(select * from WORK.ICD10 t2 ON  /* LIST OF DIAGNOSIS CODES*/ where 
                              t1.CLM_DGNS_1_CD = t2.F1
                              or t1.CLM_DGNS_2_CD = t2.F1
                              or t1.CLM_DGNS_3_CD = t2.F1
                              or t1.CLM_DGNS_4_CD = t2.F1
                                             );

 QUIT;

With an EXISTS query, the program should return as soon as it finds the first matching row, and that may speed things up (with the inner join, SQL will insist on fetching all matching rows).

Batman
Quartz | Level 8

Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.

Kurt_Bremser
Super User

@Batman wrote:

Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.


Then you must look at non-SQL solutions. For the lookup, loading the wanted diagnosis codes into a hash object will outperform any SQL join.

Depending on the cardinality of geo_bene_sk, even the COUNT DISTINCT might be doable in a hash, without any previous sorting.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 614 views
  • 0 likes
  • 3 in conversation