I have a dataset that contains several ICD-10 codes along with their explanations. I would like to generate new variables based on the ICD-10 codes in the dataset. New variables will be like ICD_var1,ICD_var2,... on the table. I would like to improve my abilities for this kind of variables. I could not find similar examples. Could you please help me? Thank you in advance.(The screenshot is what I want to have)
/***My data creating code***/
data patient_icd;
input Patient_ID ICD_CODES $100.;
ICD_VAR1 = scan(ICD_CODES, 1, '|');
ICD_VAR2 = scan(ICD_CODES, 2, '|');
ICD_VAR3 = scan(ICD_CODES, 3, '|');
ICD_VAR4 = scan(ICD_CODES, 4, '|');
ICD_VAR5 = scan(ICD_CODES, 5, '|');
ICD_VAR6 = scan(ICD_CODES, 6, '|');
ICD_VAR7 = scan(ICD_CODES, 7, '|');
ICD_VAR8 = scan(ICD_CODES, 8, '|');
ICD_VAR9 = scan(ICD_CODES, 9, '|');
ICD_VAR10 = scan(ICD_CODES, 10, '|');
ICD_VAR11 = scan(ICD_CODES, 11, '|');
ICD_VAR12 = scan(ICD_CODES, 12, '|');
ICD_VAR13 = scan(ICD_CODES, 13, '|');
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
How exactly do you expect to use the resulting data set?
For a great many purposes you would be better off making a single ICD_10 variable and one observation per value. The question is what rules can be used to identify just the ICD-10 values.
At first pass you might think that looking for stuff inside () will suffice. But you have this "(current)" appear in the #8 block of the first patient.
Thank you for your help. I am trying to see each patients different diagnosis separately as a icd_var1, icd_var2,...
As @ballardw has already suggested, separating the diagonoses into rows rather than variables makes it easier for analysis:
data patient_icd;
keep Patient_ID ICD_VAR;
input @1 Patient_ID $7. @8 ICD_CODES $600.;
do i = 1 to 13;
ICD_VAR = scan(ICD_CODES, i, '|');
if ICD_VAR = '' then leave;
output;
end;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;
Your code worked and I wrote the following codes and I got what I wanted so for. I have a questions. How can I adjust your code for using a sas dataset instead of datalines statement for creating patient_icd dataset. I tried but I could not do it. Can you help me?
data patient_icd_2;
set patient_icd;
icd_codes = scan(icd_var, -2, '()');
run;
proc transpose data=patient_icd_2 out=patient_icd_3;
by patient_id;
var icd_codes;
run;
/* Rename the variable and remove the prefix 'COL' */
data patient_icd_4;
set patient_icd_3;
rename col1=icd_var1 col2=icd_var2 col3=icd_var3 col4=icd_var4 col5=icd_var5
col6=icd_var6 col7=icd_var7 col8=icd_var8 col9=icd_var9 col10=icd_var10;
drop _name_;
run;
You just need to replace the INPUT statement with a SET statement naming your input dataset:
data patient_icd;
keep Patient_ID ICD_VAR;
set have;
do i = 1 to 13;
ICD_VAR = scan(ICD_CODES, i, '|');
if ICD_VAR = '' then leave;
output;
end;
run;
If you really need a single patient_id with multiple variables for each ICD value then you can do that in a single data step:
/* Set this to the maximum number of icdVar values in your icd_codes variable */
/* Note you may need to set this dynamically depending on your data */
%let icdVarCnt=13 ;
data patient_icd;
/* Keep the variables you want in the output dataset */
keep patient_id icd_var1-icd_var&icdVarCnt ;
/* Create an array to hold the individual icd_var values */
array icd_var{&icdVarCnt} $100 ;
/* Read test data */
input @1 Patient_ID $7. @8 ICD_CODES $600.;
/* Loop through the icd_vsr variable extracting individual icd values */
do i=1 to &icdVarCnt ;
/* extract individual icd values */
ICD_VAR(i) = scan(ICD_CODES, i, '|') ;
/* Drop out off the loop if there are no more individual icd values */
if ICD_VAR(i) = '' then leave;
end;
/* write observation to output dataset */
output work.patient_icd ;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;
Like others who have replied, I would not recommend this approach but only you know what your intended outcome is.
data have;
infile datalines truncover;
input Patient_ID ICD_CODES $600.;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
;
run;
data temp;
set have;
pid=prxparse('/\(.+?\)/');
s=1;e=length(ICD_CODES);
call prxnext(pid,s,e,ICD_CODES,p,l);
do while(p>0);
value=compress(substr(ICD_CODES,p,l),'()');
if value ne: '#' then output;
call prxnext(pid,s,e,ICD_CODES,p,l);
end;
keep Patient_ID ICD_CODES value;
run;
proc transpose data=temp out=want(drop=_NAME_) prefix=ICD_VAR;
by Patient_ID ICD_CODES ;
var value;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.