BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sks521
Quartz | Level 8

Hi folks,

 

I have a variable on ICD-10 codes in the following form and there are more than 70000 observations. I want to group for e.g. A01 to A09 in one category without having to write all the listed codes. Can someone suggest how do I go about it please? ta

A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
1 ACCEPTED SOLUTION

Accepted Solutions
JeffMaggio
Obsidian | Level 7

If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.


data have;
   input icd $char4.;
   datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
run;

proc format; 
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;


data want;
   set have;
   new_col = icd;
   format new_col ICD10_Groups.;
run;

Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value. 

This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order. 

View solution in original post

6 REPLIES 6
Astounding
PROC Star

Since you didn't describe the steps that use the grouping, let's do it in the most flexible way.  Create a new variable (sample value would be "A0"):

 

data want;
   set have;
   length grouping_var $ 2;
   grouping_var = icd10;
run;

Use that new variable in your analyses.

Amir
PROC Star

Does the following give you what you want?

 

data have;
   input icd $char4.;
   datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;


data want;
   set have;
   icd_group = substr(icd,2,2);
run;

If not, then please show what results you want for the input you have provided.

 

 

Kind regards,

Amir.

JeffMaggio
Obsidian | Level 7

If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.


data have;
   input icd $char4.;
   datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
run;

proc format; 
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;


data want;
   set have;
   new_col = icd;
   format new_col ICD10_Groups.;
run;

Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value. 

This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order. 

ballardw
Super User

@JeffMaggio wrote:

If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.


data have;
   input icd $char4.;
   datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
run;

proc format; 
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;


data want;
   set have;
   new_col = icd;
   format new_col ICD10_Groups.;
run;

Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value. 

This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order. 


Unfortunately range comparisons for Character values often yield unexpected results:

proc format library=work; 
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;

data example;
   x='A14';
   put x= $icd10_groups.;
run;

Places a value of A14 into the A100 - A150 range. Likely not the desired result. The specific values needed may not have this concern.

Shorter values are only compared up to last character of either the value range definition or the value using the format so extreme care needs to be used with character formats.

 

sks521
Quartz | Level 8

Thanks for this code. But can you have a look at my syntax and let me know why variable 'Regroup' prints out blank.

 

Ta

proc format;
value $ICD
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
;
run;
data ICD10;
set specialty;
New_code=diagnosisprimaryicd;
format New_code ICD.;
run;
*Regrouping ICD codes into main groups on WHO website here https://icd.who.int/browse10/2016/en;
Data recode_ICD;
set ICD10;
if New_code=' ' then Regroup=' ';
else if New_code in ('Intestinal_infection','TB','Zoonotic','Other_bacterial','STI','Spirochaetal','Viral_CNS','Viral-fever','Viral_skin','Viral_hep','HIV','Other_viral',
'Mycoses','Protozoa','Helminthiases','Other_infestation','Sequelae', 'Other') then Regroup='Infections';
else if New_code in ('Neoplasms','Other_neoplasms') then Regroup='Neoplasm';
else if New_code in ('Blood_disorder') then Regroup='Blood_dis';
else if New_code in ('Endocrine') then Regroup='Endocrine';
else if New_code in ('Mental_disorder') then Regroup='Mental_dis';
else if New_code in ('Nervous') then Regroup='Nervous';
else if New_code in ('Eye_adnexa') then Regroup='Adnexa';
else if New_code in ('Ear_mastoid') then Regroup='Ear_mast';
else if New_code in ('Circulatory') then Regroup='Circulatory';
else if New_code in ('Respiratory') then Regroup='Respiration';
else if New_code in ('Digestive') then Regroup='Digestive';
else if New_code in ('Subcutanous') then Regroup='Subcutanous';
else if New_code in ('MSK') then Regroup='MSK';
else if New_code in ('Genitourinary') then Regroup='Genitourinary';
else if New_code in ('Pregnancy') then Regroup='Pregnancy';
else if New_code in ('Perinatal') then Regroup='Perinatal';
else if New_code in ('Congenital') then Regroup='Congenital';
else if New_code in ('Clinical_laboratory') then Regroup='Clinical_laboratory';
else if New_code in ('Injuries','Multiple_injuries','Transport_accident','Other_accidents','Other_injuries','Sequelae') then Regroup='Injuries';
else if New_code in ('Potential_hazard','Ungrouped') then Regroup='Other';
run;
JeffMaggio
Obsidian | Level 7

The value to compare to is still the ICD10 code. If you want that second grouping, the way I would do it is to add another format to your proc format step.

proc format;
value $ICD
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
;
value $regroup
'A000'-'B990'='Infections'
'C000'-'D489'='Neoplasm'
'D500'-'D899'='Blood_dis'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_dis'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mast'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiration'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'Y989'='Injuries'
'Z000'-'Z999'='Other'
other='Other'
;


data ICD10;
   set have;
   New_code = icd;
   Regroup = icd;
   format new_code ICD. Regroup regroup.;

run;

 

You can also get your code to work by using 

 New_code = put(icd,ICD.);

to change the value of New_code before hitting it with your else if code. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1204 views
  • 1 like
  • 5 in conversation