Hi folks,
I have a variable on ICD-10 codes in the following form and there are more than 70000 observations. I want to group for e.g. A01 to A09 in one category without having to write all the listed codes. Can someone suggest how do I go about it please? ta
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.
data have;
input icd $char4.;
datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
run;
proc format;
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;
data want;
set have;
new_col = icd;
format new_col ICD10_Groups.;
run;
Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value.
This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order.
Since you didn't describe the steps that use the grouping, let's do it in the most flexible way. Create a new variable (sample value would be "A0"):
data want;
set have;
length grouping_var $ 2;
grouping_var = icd10;
run;
Use that new variable in your analyses.
Does the following give you what you want?
data have;
input icd $char4.;
datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
data want;
set have;
icd_group = substr(icd,2,2);
run;
If not, then please show what results you want for the input you have provided.
Kind regards,
Amir.
If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.
data have;
input icd $char4.;
datalines;
A010
A011
A020
A029
A033
A039
A044
A045
A047
A048
A059
A060
A071
A072
A080
A081
A082
A083
A084
A090
A099
A150
A153
A157
A162
A163
A165
;
run;
proc format;
value $ICD10_Groups
'A010' - 'A099' = 'A010-A099'
'A100' - 'A150' = 'A100 - A150'
other = 'Ungrouped'
;
run;
data want;
set have;
new_col = icd;
format new_col ICD10_Groups.;
run;
Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value.
This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order.
@JeffMaggio wrote:
If the groupings aren't as simple as the first 2 characters or similarly trivial, you can use proc format. You would only have to specify the endpoints, not every value.
data have; input icd $char4.; datalines; A010 A011 A020 A029 A033 A039 A044 A045 A047 A048 A059 A060 A071 A072 A080 A081 A082 A083 A084 A090 A099 A150 A153 A157 A162 A163 A165 ; run; proc format; value $ICD10_Groups 'A010' - 'A099' = 'A010-A099' 'A100' - 'A150' = 'A100 - A150' other = 'Ungrouped' ; run; data want; set have; new_col = icd; format new_col ICD10_Groups.; run;
Depending on how you want it to handle values that aren't specifically listed, you can use 'other=' as in the example, or alternatively, you can exclude this and it would simply return the icd value.
This relies on the values all being 4 characters, as I believe characters are ordered in lexicographical order.
Unfortunately range comparisons for Character values often yield unexpected results:
proc format library=work; value $ICD10_Groups 'A010' - 'A099' = 'A010-A099' 'A100' - 'A150' = 'A100 - A150' other = 'Ungrouped' ; run; data example; x='A14'; put x= $icd10_groups.; run;
Places a value of A14 into the A100 - A150 range. Likely not the desired result. The specific values needed may not have this concern.
Shorter values are only compared up to last character of either the value range definition or the value using the format so extreme care needs to be used with character formats.
Thanks for this code. But can you have a look at my syntax and let me know why variable 'Regroup' prints out blank.
Ta
proc format;
value $ICD
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
;
run;
data ICD10;
set specialty;
New_code=diagnosisprimaryicd;
format New_code ICD.;
run;
*Regrouping ICD codes into main groups on WHO website here https://icd.who.int/browse10/2016/en;
Data recode_ICD;
set ICD10;
if New_code=' ' then Regroup=' ';
else if New_code in ('Intestinal_infection','TB','Zoonotic','Other_bacterial','STI','Spirochaetal','Viral_CNS','Viral-fever','Viral_skin','Viral_hep','HIV','Other_viral',
'Mycoses','Protozoa','Helminthiases','Other_infestation','Sequelae', 'Other') then Regroup='Infections';
else if New_code in ('Neoplasms','Other_neoplasms') then Regroup='Neoplasm';
else if New_code in ('Blood_disorder') then Regroup='Blood_dis';
else if New_code in ('Endocrine') then Regroup='Endocrine';
else if New_code in ('Mental_disorder') then Regroup='Mental_dis';
else if New_code in ('Nervous') then Regroup='Nervous';
else if New_code in ('Eye_adnexa') then Regroup='Adnexa';
else if New_code in ('Ear_mastoid') then Regroup='Ear_mast';
else if New_code in ('Circulatory') then Regroup='Circulatory';
else if New_code in ('Respiratory') then Regroup='Respiration';
else if New_code in ('Digestive') then Regroup='Digestive';
else if New_code in ('Subcutanous') then Regroup='Subcutanous';
else if New_code in ('MSK') then Regroup='MSK';
else if New_code in ('Genitourinary') then Regroup='Genitourinary';
else if New_code in ('Pregnancy') then Regroup='Pregnancy';
else if New_code in ('Perinatal') then Regroup='Perinatal';
else if New_code in ('Congenital') then Regroup='Congenital';
else if New_code in ('Clinical_laboratory') then Regroup='Clinical_laboratory';
else if New_code in ('Injuries','Multiple_injuries','Transport_accident','Other_accidents','Other_injuries','Sequelae') then Regroup='Injuries';
else if New_code in ('Potential_hazard','Ungrouped') then Regroup='Other';
run;
The value to compare to is still the ICD10 code. If you want that second grouping, the way I would do it is to add another format to your proc format step.
proc format;
value $ICD
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
;
value $regroup
'A000'-'B990'='Infections'
'C000'-'D489'='Neoplasm'
'D500'-'D899'='Blood_dis'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_dis'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mast'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiration'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'Y989'='Injuries'
'Z000'-'Z999'='Other'
other='Other'
;
data ICD10;
set have;
New_code = icd;
Regroup = icd;
format new_code ICD. Regroup regroup.;
run;
You can also get your code to work by using
New_code = put(icd,ICD.);
to change the value of New_code before hitting it with your else if code.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.