BookmarkSubscribeRSS Feed
OJohn_StaT
Obsidian | Level 7

I have a dataset that contains several ICD-10 codes along with their explanations. I would like to generate new variables based on the ICD-10 codes in the dataset. New variables will be like ICD_var1,ICD_var2,... on the table. I would like to improve my abilities for this kind of variables. I could not find similar examples. Could you please help me? Thank you in advance.(The screenshot is what I want to have)

 

 

 

 

OJohn_StaT_0-1682022532239.png

/***My data creating code***/
data patient_icd;
   input Patient_ID ICD_CODES $100.;
   ICD_VAR1 = scan(ICD_CODES, 1, '|');
   ICD_VAR2 = scan(ICD_CODES, 2, '|');
   ICD_VAR3 = scan(ICD_CODES, 3, '|');
   ICD_VAR4 = scan(ICD_CODES, 4, '|');
   ICD_VAR5 = scan(ICD_CODES, 5, '|');
   ICD_VAR6 = scan(ICD_CODES, 6, '|');
   ICD_VAR7 = scan(ICD_CODES, 7, '|');
   ICD_VAR8 = scan(ICD_CODES, 8, '|');
   ICD_VAR9 = scan(ICD_CODES, 9, '|');
   ICD_VAR10 = scan(ICD_CODES, 10, '|');
   ICD_VAR11 = scan(ICD_CODES, 11, '|');
   ICD_VAR12 = scan(ICD_CODES, 12, '|');
   ICD_VAR13 = scan(ICD_CODES, 13, '|');
   datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
7 REPLIES 7
ballardw
Super User

How exactly do you expect to use the resulting data set?

For a great many purposes you would be better off making a single ICD_10 variable and one observation per value. The question is what rules can be used to identify just the ICD-10 values.

At first pass you might think that looking for stuff inside () will suffice. But you have this "(current)" appear in the #8 block of the first patient.

OJohn_StaT
Obsidian | Level 7

Thank you for your help. I am trying to see each patients different diagnosis separately as a icd_var1, icd_var2,... 

SASKiwi
PROC Star

As @ballardw has already suggested, separating the diagonoses into rows rather than variables makes it easier for analysis:

data patient_icd;
  keep Patient_ID ICD_VAR;
   input @1 Patient_ID $7. @8 ICD_CODES $600.;
   do i = 1 to 13;
     ICD_VAR = scan(ICD_CODES, i, '|');
	 if ICD_VAR = '' then leave;
	 output;
   end;
   datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;

 

OJohn_StaT
Obsidian | Level 7

Your code worked  and I wrote the following codes and I got what I wanted so for. I have a questions. How can I adjust your code for using a sas dataset instead of datalines statement for creating  patient_icd dataset. I tried but I could not do it. Can you help me?

 

data patient_icd_2;
set patient_icd;
icd_codes = scan(icd_var, -2, '()');
run;


proc transpose data=patient_icd_2 out=patient_icd_3;
by patient_id;
var icd_codes;
run;

/* Rename the variable and remove the prefix 'COL' */
data patient_icd_4;
set patient_icd_3;
rename col1=icd_var1 col2=icd_var2 col3=icd_var3 col4=icd_var4 col5=icd_var5
col6=icd_var6 col7=icd_var7 col8=icd_var8 col9=icd_var9 col10=icd_var10;
drop _name_;
run;

SASKiwi
PROC Star

You just need to replace the INPUT statement with a SET statement naming your input dataset:

data patient_icd;
  keep Patient_ID ICD_VAR;
  set have;
   do i = 1 to 13;
     ICD_VAR = scan(ICD_CODES, i, '|');
	 if ICD_VAR = '' then leave;
	 output;
   end;
run;
AMSAS
SAS Super FREQ

If you really need a single patient_id with multiple variables for each ICD value then you can do that in a single data step: 

/* Set this to the maximum number of icdVar values in your icd_codes variable */
/* Note you may need to set this dynamically depending on your data           */
%let icdVarCnt=13 ;

data patient_icd;
	/* Keep the variables you want in the output dataset */
	keep patient_id icd_var1-icd_var&icdVarCnt ;
	/* Create an array to hold the individual icd_var values */
	array icd_var{&icdVarCnt} $100 ;
	/* Read test data */
	input @1 Patient_ID $7. @8 ICD_CODES $600.;
	/* Loop through the icd_vsr variable extracting individual icd values */
	do i=1 to &icdVarCnt ; 
		/* extract individual icd values */ 
		ICD_VAR(i) = scan(ICD_CODES, i, '|') ;
		/* Drop out off the loop if there are no more individual icd values */
		if ICD_VAR(i) = '' then leave;
	end;
	/* write observation to output dataset */
	output work.patient_icd ;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;

Like others who have replied, I would not recommend this approach but only you know what your intended outcome is. 

Ksharp
Super User
data have;
infile datalines truncover;
input Patient_ID ICD_CODES $600.;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
;
run;

data temp;
 set have;
 pid=prxparse('/\(.+?\)/');
 s=1;e=length(ICD_CODES);
 call prxnext(pid,s,e,ICD_CODES,p,l);
 do while(p>0);
  value=compress(substr(ICD_CODES,p,l),'()');
  if value ne: '#' then output;
  call prxnext(pid,s,e,ICD_CODES,p,l);
 end;
 keep Patient_ID ICD_CODES value;
run;
proc transpose data=temp out=want(drop=_NAME_) prefix=ICD_VAR;
by Patient_ID ICD_CODES ;
var value;
run;