DATA Step, Macro, Functions and more

Categorize a variable by ranges of values

Reply
Super Contributor
Posts: 322

Categorize a variable by ranges of values

I have ICD9 procedure codes to categorize by generic ranges. Is there more efficient way to do this work not necessarily spelling them out by each value in the range, like i shown for icd9_EYE below? It can be so tedious since I have to deal with cpt and icd-10 codes where I need to deal with ranges in similar fashion too.

 

if proc_cd in ('08', '09', '10','11','12','13','14','15','16') then flag='icd9_EYE';

 

if proc_cd in ('00') then   flag='icd9_NOS';
if proc_cd in ('01-05) then flag='icd9_NERVOUS' ;
if proc_cd in ('06-07) then flag='icd9_ENDOCRINE';
if proc_cd in ('08-16) then flag='icd9_EYE';
if proc_cd in ('17) then flag='icd9_MISC';
if proc_cd in ('18-20) then flag='icd9_EAR'
if proc_cd in ('21-29) then flag='icd9_NOSE_MOUTH_PHARYNX';
if proc_cd in ('30-34) then flag='icd9_RESPIRATORY';
if proc_cd in ('35-39) then flag='icd9_CVD';
if proc_cd in ('40-41) then flag='icd9_HEMIC_LYMPH';
if proc_cd in ('42-54) then flag='icd9_DIGEST';
if proc_cd in ('55-59) then flag='icd9_URINARY';
if proc_cd in ('60-64) then flag='icd9_MALE_GEN';
if proc_cd in ('65-71) then flag='icd9_FEMALE_GEN';
if proc_cd in ('72-75) then flag='icd9_OBSTET';
if proc_cd in ('76-84) then flag='icd9_MUS_SKEL';
if proc_cd in ('85-86) then flag='icd9_INTEGUMENTARY';
if proc_cd in ('87-99) then flag='icd9_MISC';

 

Respected Advisor
Posts: 2,802

Re: Categorize a variable by ranges of values

Super User
Posts: 23,237

Re: Categorize a variable by ranges of values

That’s not going to work. Where do you groupings live? In your head, or in an Excel file? Can it be made into a data set? I strongly second the PROC FORMAT option, ideally driven by a data set from Excel or creating a table using cards/datalines.

 

If you have issues post your code and we’ll help you modify it.

Super Contributor
Posts: 322

Re: Categorize a variable by ranges of values

Bigger picture might not be relevant to this particular problem. However, just throwing it out there where it comes from, I have a very dirty insurance claim data. The working column 'proc_cd' for a procedure code is a melting pot of CPT, ICD-9 procedure and some suspicious codes looking even just like ICD-10 procedure code. I'm starting from the obvious to me which is not necessarily to be true or correct. I suspect cross section of flag="other" and width=4 to be ICD-9 procedure code. Because CPT takes more than 5 digits. Since I have it suspected for ICD-9 procedure code then I looked up ICD-9 codebook and trying to categorize it by its generic ranges. Categorizing it helps to weed out non ICD9 procedure code values since being 4 digit only doesn't define it be ICD-9 procedure code alone. Also available online.

 

http://www.icd9data.com/2012/Volume3/default.htm

 


data x; set uniq_codes;
width=length(proc_cd); 
length flag $20; 
if prxMatch("/^\d{5}\s*$/o",omm_proc_cd) then flag = "CPT1";
else if prxMatch("/^\d{4}F\s*$/o",proc_cd) then flag = "CPT2";
else if prxMatch("/^\d{4}T\s*$/o",proc_cd) then flag = "CPT3";
else if prxMatch("/^V\d{4}\s*$/o",proc_cd) then flag = "VCODE";
else if prxMatch("/^E\d{4}\s*$/o",proc_cd) then flag = "ECODE";
else if prxMatch("/^D\d{4}\s*$/o",proc_cd) then flag = "DCODE";
else if prxMatch("/^Z\d{4}\s*$/o",proc_cd) then flag = "ZCODE";
else if prxMatch("/^T\d{4}\s*$/o",proc_cd) then flag = "TCODE";
else if prxMatch("/^\D{5}\s*$/o",proc_cd) then flag = "NON_DIG";
else flag="other";
if flag="other" and width=4 then flag="POTENTIAL_ICD9"; 
run; 

data x1; set x;
/*ICD-9 procedure code*/
if flag="other" and width=4 then output;
run;

proc freq data=x;
tables width*flag/norow nocol nocum nopercent missing;
run; 
Super Contributor
Posts: 322

Re: Categorize a variable by ranges of values

just added:
else if prxMatch("/^\d{4}\s*$/o",proc_cd) then flag = "ICD9";
else flag="other";
Frequent Contributor
Posts: 80

Re: Categorize a variable by ranges of values

assuming i understand what youre asking, unfortunately i don't think there is a simpler way. If you look at sas macros that have been released for handling icd9 and icd10 codes they are incredibly longwinded and tedious eg see 'PMCA Version 2 Statistical Analysis System (SAS) Programming Code' here: http://www.seattlechildrens.org/research/child-health-behavior-and-development/mangione-smith-lab/me...

--------------
blog: papersandprograms.com
Ask a Question
Discussion stats
  • 5 replies
  • 136 views
  • 3 likes
  • 4 in conversation