I'm trying to get a unique patient count based on the presence of certain Diagnosis codes in multiple diagnosis code columns. It has to be done in an sql query. This code works, but I'm wondering if there is a more efficient method.
PROC SQL;
SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
FROM MDCRD_AL.V2_FCT_CLM t1 /*CLAIMS FILE*/
INNER JOIN WORK.ICD10 t2 ON /* LIST OF DIAGNOSIS CODES*/
(
t1.CLM_DGNS_1_CD = t2.F1
or t1.CLM_DGNS_2_CD = t2.F1
or t1.CLM_DGNS_3_CD = t2.F1
or t1.CLM_DGNS_4_CD = t2.F1
)
WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y';
QUIT;
You may get some performance enhancement by using an EXISTS query (as the inner join may be returning more than one row from the diagnosis table):
PROC SQL;
SELECT (COUNT(distinct t1.GEO_BENE_SK)) AS COUNT_of_GEO_BENE_SK
FROM MDCRD_AL.V2_FCT_CLM t1 /*CLAIMS FILE*/
WHERE t1.CLM_THRU_DT BETWEEN '1Jan2016'd AND '31Dec2016'd
AND t1.CLM_TYPE_CD = 60 AND t1.CLM_FINL_ACTN_IND = 'Y'
and exists(select * from WORK.ICD10 t2 ON /* LIST OF DIAGNOSIS CODES*/ where
t1.CLM_DGNS_1_CD = t2.F1
or t1.CLM_DGNS_2_CD = t2.F1
or t1.CLM_DGNS_3_CD = t2.F1
or t1.CLM_DGNS_4_CD = t2.F1
);
QUIT;
With an EXISTS query, the program should return as soon as it finds the first matching row, and that may speed things up (with the inner join, SQL will insist on fetching all matching rows).
"Efficient" in terms of simpler code or better performance?
Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.
@Batman wrote:
Either, but mostly I'm hoping to speed things up, so I guess better performance is the priority.
Then you must look at non-SQL solutions. For the lookup, loading the wanted diagnosis codes into a hash object will outperform any SQL join.
Depending on the cardinality of geo_bene_sk, even the COUNT DISTINCT might be doable in a hash, without any previous sorting.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Select SAS Training centers are offering in-person courses. View upcoming courses for: