- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi folks,
I have variable with ICD-10 codes. I want to group them as below;
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
What's the best way of indexing them in these groups? I have used 'Format' syntax but this is not how I want it to be.
Ta
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just to be clear, you do not want to do this with formats?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No, because when I do it with formats, I can't create new variables using these categories.
For e.g. I used this syntax and tried to create another variable 'Regroup' from old variable 'New_code' in the next data step, 'Regroup prints out blank.
*indexing IDC-10 codes/grouping them in main categories;
proc format;
value $ICD
'A000'-'A099'='Intestinal_infection'
'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic'
'A300'-'A499'='Other_bacterial'
'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal'
'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever'
'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep'
'B200'-'B240'='HIV'
'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses'
'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases'
'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae'
'B950'-'B990'= 'Other'
'C000'-'C970'= 'Neoplasms'
'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous'
'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries'
'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard'
other='Ungrouped'
;
run;
data ICD10;
set specialty;
New_code=diagnosisprimaryicd;
format New_code ICD.;
run;
*Regrouping ICD codes into main groups on WHO website here https://icd.who.int/browse10/2016/en;
Data recode_ICD;
set ICD10;
if New_code=' ' then Regroup=' ';
else if New_code in ('Intestinal_infection','TB','Zoonotic','Other_bacterial','STI','Spirochaetal','Viral_CNS','Viral-fever','Viral_skin','Viral_hep','HIV','Other_viral',
'Mycoses','Protozoa','Helminthiases','Other_infestation','Sequelae', 'Other') then Regroup='Infections';
else if New_code in ('Neoplasms','Other_neoplasms') then Regroup='Neoplasm';
else if New_code in ('Blood_disorder') then Regroup='Blood_dis';
else if New_code in ('Endocrine') then Regroup='Endocrine';
else if New_code in ('Mental_disorder') then Regroup='Mental_dis';
else if New_code in ('Nervous') then Regroup='Nervous';
else if New_code in ('Eye_adnexa') then Regroup='Adnexa';
else if New_code in ('Ear_mastoid') then Regroup='Ear_mast';
else if New_code in ('Circulatory') then Regroup='Circulatory';
else if New_code in ('Respiratory') then Regroup='Respiration';
else if New_code in ('Digestive') then Regroup='Digestive';
else if New_code in ('Subcutanous') then Regroup='Subcutanous';
else if New_code in ('MSK') then Regroup='MSK';
else if New_code in ('Genitourinary') then Regroup='Genitourinary';
else if New_code in ('Pregnancy') then Regroup='Pregnancy';
else if New_code in ('Perinatal') then Regroup='Perinatal';
else if New_code in ('Congenital') then Regroup='Congenital';
else if New_code in ('Clinical_laboratory') then Regroup='Clinical_laboratory';
else if New_code in ('Injuries','Multiple_injuries','Transport_accident','Other_accidents','Other_injuries','Sequelae') then Regroup='Injuries';
else if New_code in ('Potential_hazard','Ungrouped') then Regroup='Other';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Formatting new_code will just change the appearance of diagnosisprimaryicd but the underlying value is still the same. So when you try to regroup, nothing happens because the underlying value is still there. Instead, you can define new_code in the following way and your logic should work.
Try
new_code=put(diagnosisprimaryicd,$icd.);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Whereabouts in the code do I put this please?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data ICD10;
set specialty;
New_code=put(diagnosisprimaryicd, $icd.);
if New_code=' ' then
Regroup=' ';
else if New_code in ('Intestinal_infection', 'TB', 'Zoonotic',
'Other_bacterial', 'STI', 'Spirochaetal', 'Viral_CNS', 'Viral-fever',
'Viral_skin', 'Viral_hep', 'HIV', 'Other_viral', 'Mycoses', 'Protozoa',
'Helminthiases', 'Other_infestation', 'Sequelae', 'Other') then
Regroup='Infections';
else if New_code in ('Neoplasms', 'Other_neoplasms') then
Regroup='Neoplasm';
else if New_code in ('Blood_disorder') then
Regroup='Blood_dis';
else if New_code in ('Endocrine') then
Regroup='Endocrine';
else if New_code in ('Mental_disorder') then
Regroup='Mental_dis';
else if New_code in ('Nervous') then
Regroup='Nervous';
else if New_code in ('Eye_adnexa') then
Regroup='Adnexa';
else if New_code in ('Ear_mastoid') then
Regroup='Ear_mast';
else if New_code in ('Circulatory') then
Regroup='Circulatory';
else if New_code in ('Respiratory') then
Regroup='Respiration';
else if New_code in ('Digestive') then
Regroup='Digestive';
else if New_code in ('Subcutanous') then
Regroup='Subcutanous';
else if New_code in ('MSK') then
Regroup='MSK';
else if New_code in ('Genitourinary') then
Regroup='Genitourinary';
else if New_code in ('Pregnancy') then
Regroup='Pregnancy';
else if New_code in ('Perinatal') then
Regroup='Perinatal';
else if New_code in ('Congenital') then
Regroup='Congenital';
else if New_code in ('Clinical_laboratory') then
Regroup='Clinical_laboratory';
else if New_code in ('Injuries', 'Multiple_injuries', 'Transport_accident',
'Other_accidents', 'Other_injuries', 'Sequelae') then
Regroup='Injuries';
else if New_code in ('Potential_hazard', 'Ungrouped') then
Regroup='Other';
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Still prints out blank!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Without more info about what you're data looks like, I can't really help you. The reasons it could be printing blank are uncountable. What I've done here works. See if you can adapt to how you need.
proc format cntlout=icdgrpfmt(keep=fmtname start end label hlo
where=(fmtname='ICDGRP'));
value $icdgrp 'A000'-'A099'='Intestinal_infection' 'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic' 'A300'-'A499'='Other_bacterial' 'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal' 'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever' 'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep' 'B200'-'B240'='HIV' 'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses' 'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases' 'B850'-'B890'='Other_infestation'
'B900'-'B949', 'Y100'-'Y989'='Sequelae' 'B950'-'B990'='Other'
'C000'-'C970'='Neoplasms' 'D000'-'D489'='Other_neoplasms'
'D500'-'D899'='Blood_disorder' 'E000'-'E899'='Endocrine'
'F000'-'F990'='Mental_disorder' 'G000'-'G998'='Nervous'
'H000'-'H599'='Eye_adnexa' 'H600'-'H959'='Ear_mastoid'
'1000'-'1990'='Circulatory' 'J000'-'J998'='Respiratory'
'K000'-'K938'='Digestive' 'L000'-'L998'='Subcutanous' 'M000'-'M999'='MSK'
'N000'-'N999'='Genitourinary' 'O000'-'O998'='Pregnancy'
'P000'-'P969'='Perinatal' 'Q000'-'Q999'='Congenital'
'R000'-'R999'='Clinical_laboratory' 'S0000'-'S9999'='Injuries'
'T0000'-'T983'='Multiple_injuries' 'V010'-'V990'='Transport_accident'
'W000'-'W990'='Other_accidents' 'X000'-'X840'='Other_injuries'
'Z000'-'Z999'='Potential_hazard' other='Ungrouped';
run;
data icdregrp;
retain fmtname '$rgrp' type 'c';
set icdgrpfmt(drop=start end fmtname rename=label=old_label);
start=old_label;
length label $50.;
if start=' ' then
label=' ';
else if start in ('Intestinal_infection', 'TB', 'Zoonotic', 'Other_bacterial',
'STI', 'Spirochaetal', 'Viral_CNS', 'Viral-fever', 'Viral_skin', 'Viral_hep',
'HIV', 'Other_viral', 'Mycoses', 'Protozoa', 'Helminthiases',
'Other_infestation', 'Sequelae', 'Other') then
label='Infections';
else if start in ('Neoplasms', 'Other_neoplasms') then
label='Neoplasm';
else if start in ('Blood_disorder') then
label='Blood_dis';
else if start in ('Endocrine') then
label='Endocrine';
else if start in ('Mental_disorder') then
label='Mental_dis';
else if start in ('Nervous') then
label='Nervous';
else if start in ('Eye_adnexa') then
label='Adnexa';
else if start in ('Ear_mastoid') then
label='Ear_mast';
else if start in ('Circulatory') then
label='Circulatory';
else if start in ('Respiratory') then
label='Respiration';
else if start in ('Digestive') then
label='Digestive';
else if start in ('Subcutanous') then
label='Subcutanous';
else if start in ('MSK') then
label='MSK';
else if start in ('Genitourinary') then
label='Genitourinary';
else if start in ('Pregnancy') then
label='Pregnancy';
else if start in ('Perinatal') then
label='Perinatal';
else if start in ('Congenital') then
label='Congenital';
else if start in ('Clinical_laboratory') then
label='Clinical_laboratory';
else if start in ('Injuries', 'Multiple_injuries', 'Transport_accident',
'Other_accidents', 'Other_injuries') then
label='Injuries';
else if start in ('Potential_hazard', 'Ungrouped') then
label='Other';
drop old_label;
run;
proc sort data=icdregrp nodupkey out=regrp;
by start;
run;
proc format cntlin=regrp;
run;
data have;
input icd_cd $;
datalines;
A096
B955
;
run;
data want;
set have;
icd_grp_fmt=put(icd_cd, $icdgrp.);
icd_regrp_fmt=put(icd_grp_fmt, $rgrp.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sks521 wrote:
No, because when I do it with formats, I can't create new variables using these categories.
Incorrect:
Newvar = put(variable, formatname.);
will create a new variable assuming the existence of the variable and the format.
Note: Comparisons with character values and ranges is an extremely problematic activity in general.
Example:
data example; x = 'A12'; if 'A100' le x le 'A150' then put x= " is between A100 and A150"; run;
Run the code, read the log and decide if that is the desired result. The character comparisons stop at the end the shortest variable
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sks521 wrote:
No, because when I do it with formats, I can't create new variables using these categories.
For e.g. I used this syntax and tried to create another variable 'Regroup' from old variable 'New_code' in the next data step, 'Regroup prints out blank.
You're not using the formats correctly.
To apply a format to a new variable and access the formatted value you need to use PUT().
New_ICD_Code = put(diagnosisPrimaryICD, $ICD.);
Then use your IF/THEN statements from your last step. Or create a second format that does the groupings as well and then you have:
Regroup = put(New_ICD_Code, $regroup_fmt.);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's not clear what you mean by 'this is not how I want it to be'..
Proc format is definitely the way to go. Now if you'd rather have an associated index (like 2) instead or an associated character label like 'Zoonotic', then you can try the following:
proc format cntlout=icdgrpfmt(keep=start end label hlo);
value $icdgrp 'A000'-'A099'='Intestinal_infection' 'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic' 'A300'-'A499'='Other_bacterial' 'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal' 'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever' 'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep' 'B200'-'B240'='HIV' 'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses' 'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases' 'B850'-'B890'='Other_infestation'
'B900'-'B949'='Sequelae' 'B950'-'B990'='Other' 'C000'-'C970'='Neoplasms'
'D000'-'D489'='Other_neoplasms' 'D500'-'D899'='Blood_disorder'
'E000'-'E899'='Endocrine' 'F000'-'F990'='Mental_disorder'
'G000'-'G998'='Nervous' 'H000'-'H599'='Eye_adnexa'
'H600'-'H959'='Ear_mastoid' '1000'-'1990'='Circulatory'
'J000'-'J998'='Respiratory' 'K000'-'K938'='Digestive'
'L000'-'L998'='Subcutanous' 'M000'-'M999'='MSK' 'N000'-'N999'='Genitourinary'
'O000'-'O998'='Pregnancy' 'P000'-'P969'='Perinatal'
'Q000'-'Q999'='Congenital' 'R000'-'R999'='Clinical_laboratory'
'S0000'-'S9999'='Injuries' 'T0000'-'T983'='Multiple_injuries'
'V010'-'V990'='Transport_accident' 'W000'-'W990'='Other_accidents'
'X000'-'X840'='Other_injuries' 'Y100'-'Y989'='Sequelae'
'Z000'-'Z999'='Potential_hazard' other='Ungrouped';
run;
data icdgrpidx;
retain fmtname 'icdidx' type 'i';
set icdgrpfmt;
label=_n_;
run;
proc format cntlin=icdgrpidx;
run;
data have;
input icd_cd $;
datalines;
A096
B955
;
run;
data want;
set have;
icd_grp_fmt=put(icd_cd, $icdgrp.);
icd_idx_fmt=input(icd_cd, icdidx.);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Because when I try to create another variable for example if I want to combine all those as one category 'Infections', I am not able to do that or do you have any suggestion
?
'A000'-'A099'='Intestinal_infection' 'A150'-'A199'='TB'
'A200'-'A280'='Zoonotic' 'A300'-'A499'='Other_bacterial' 'A500'-'A640'='STI'
'A650'-'A699'='Spirochaetal' 'A800'-'A890'='Viral_CNS'
'A900'-'A990'='Viral-fever' 'B000'-'B099'='Viral_skin'
'B150'-'B199'='Viral_hep' 'B200'-'B240'='HIV' 'B250'-'B349'='Other_viral'
'B350'-'B490'='Mycoses' 'B500'-'B640'='Protozoa'
'B650'-'B839'='Helminthiases' 'B850'-'B890'='Other_infestation'
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@sks521 wrote:
Because when I try to create another variable for example if
I want to combine all those as one category 'Infections', I am not able to do that or do you have any suggestion
?'A000'-'A099'='Intestinal_infection' 'A150'-'A199'='TB' 'A200'-'A280'='Zoonotic' 'A300'-'A499'='Other_bacterial' 'A500'-'A640'='STI' 'A650'-'A699'='Spirochaetal' 'A800'-'A890'='Viral_CNS' 'A900'-'A990'='Viral-fever' 'B000'-'B099'='Viral_skin' 'B150'-'B199'='Viral_hep' 'B200'-'B240'='HIV' 'B250'-'B349'='Other_viral' 'B350'-'B490'='Mycoses' 'B500'-'B640'='Protozoa' 'B650'-'B839'='Helminthiases' 'B850'-'B890'='Other_infestation'
You really need to provide some example data and the desired result.
If you want a result of INFECTIONS then you want a different format.
The same values can be mapped to more than value using multiple formats.
For example I have formats that assign ZIP code to various service regions besides the more typical City, State or County. Change the format to the one I want for a specific purpose.
And for some reports using the procedures that support MULTILABEL Formats I have formats that will use a Site identifier to produce summaries by Site as well as group of sites (region, agency or what have you).