I'm trying to get a unique patient count based on the presence of certain Diagnosis codes in multiple diagnosis code columns. It has to be done in an sql query. This code works, but I'm wondering if there is a more efficient method.
PROC SQL;
SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
FROM MDCRD_AL.V2_FCT_CLM t1 /*CLAIMS FILE*/
INNER JOIN WORK.ICD10 t2 ON /* LIST OF DIAGNOSIS CODES*/
(
t1.CLM_DGNS_1_CD = t2.F1
or t1.CLM_DGNS_2_CD = t2.F1
or t1.CLM_DGNS_3_CD = t2.F1
or t1.CLM_DGNS_4_CD = t2.F1
)
WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y';
QUIT;
You may get some performance enhancement by using an EXISTS query (as the inner join may be returning more than one row from the diagnosis table):
PROC SQL;
SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
FROM MDCRD_AL.V2_FCT_CLM t1 /*CLAIMS FILE*/
WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y'
and exists(select * from WORK.ICD10 t2 ON /* LIST OF DIAGNOSIS CODES*/ where
t1.CLM_DGNS_1_CD = t2.F1
or t1.CLM_DGNS_2_CD = t2.F1
or t1.CLM_DGNS_3_CD = t2.F1
or t1.CLM_DGNS_4_CD = t2.F1
);
QUIT;
With an EXISTS query, the program should return as soon as it finds the first matching row, and that may speed things up (with the inner join, SQL will insist on fetching all matching rows).
"Efficient" in terms of simpler code or better performance?
Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.
@Batman wrote:
Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.
Then you must look at non-SQL solutions. For the lookup, loading the wanted diagnosis codes into a hash object will outperform any SQL join.
Depending on the cardinality of geo_bene_sk, even the COUNT DISTINCT might be doable in a hash, without any previous sorting.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.