BookmarkSubscribeRSS Feed
OJohn_StaT
Obsidian | Level 7

I have a dataset that contains several ICD-10 codes along with their explanations. I would like to generate new variables based on the ICD-10 codes in the dataset. New variables will be like ICD_var1,ICD_var2,... on the table. I would like to improve my abilities for this kind of variables. I could not find similar examples. Could you please help me? Thank you in advance.(The screenshot is what I want to have)

 

 

 

 

OJohn_StaT_0-1682022532239.png

/***My data creating code***/
data patient_icd;
   input Patient_ID ICD_CODES $100.;
   ICD_VAR1 = scan(ICD_CODES, 1, '|');
   ICD_VAR2 = scan(ICD_CODES, 2, '|');
   ICD_VAR3 = scan(ICD_CODES, 3, '|');
   ICD_VAR4 = scan(ICD_CODES, 4, '|');
   ICD_VAR5 = scan(ICD_CODES, 5, '|');
   ICD_VAR6 = scan(ICD_CODES, 6, '|');
   ICD_VAR7 = scan(ICD_CODES, 7, '|');
   ICD_VAR8 = scan(ICD_CODES, 8, '|');
   ICD_VAR9 = scan(ICD_CODES, 9, '|');
   ICD_VAR10 = scan(ICD_CODES, 10, '|');
   ICD_VAR11 = scan(ICD_CODES, 11, '|');
   ICD_VAR12 = scan(ICD_CODES, 12, '|');
   ICD_VAR13 = scan(ICD_CODES, 13, '|');
   datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
7 REPLIES 7
ballardw
Super User

How exactly do you expect to use the resulting data set?

For a great many purposes you would be better off making a single ICD_10 variable and one observation per value. The question is what rules can be used to identify just the ICD-10 values.

At first pass you might think that looking for stuff inside () will suffice. But you have this "(current)" appear in the #8 block of the first patient.

OJohn_StaT
Obsidian | Level 7

Thank you for your help. I am trying to see each patients different diagnosis separately as a icd_var1, icd_var2,... 

SASKiwi
PROC Star

As @ballardw has already suggested, separating the diagonoses into rows rather than variables makes it easier for analysis:

data patient_icd;
  keep Patient_ID ICD_VAR;
   input @1 Patient_ID $7. @8 ICD_CODES $600.;
   do i = 1 to 13;
     ICD_VAR = scan(ICD_CODES, i, '|');
	 if ICD_VAR = '' then leave;
	 output;
   end;
   datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;

 

OJohn_StaT
Obsidian | Level 7

Your code worked  and I wrote the following codes and I got what I wanted so for. I have a questions. How can I adjust your code for using a sas dataset instead of datalines statement for creating  patient_icd dataset. I tried but I could not do it. Can you help me?

 

data patient_icd_2;
set patient_icd;
icd_codes = scan(icd_var, -2, '()');
run;


proc transpose data=patient_icd_2 out=patient_icd_3;
by patient_id;
var icd_codes;
run;

/* Rename the variable and remove the prefix 'COL' */
data patient_icd_4;
set patient_icd_3;
rename col1=icd_var1 col2=icd_var2 col3=icd_var3 col4=icd_var4 col5=icd_var5
col6=icd_var6 col7=icd_var7 col8=icd_var8 col9=icd_var9 col10=icd_var10;
drop _name_;
run;

SASKiwi
PROC Star

You just need to replace the INPUT statement with a SET statement naming your input dataset:

data patient_icd;
  keep Patient_ID ICD_VAR;
  set have;
   do i = 1 to 13;
     ICD_VAR = scan(ICD_CODES, i, '|');
	 if ICD_VAR = '' then leave;
	 output;
   end;
run;
AMSAS
SAS Super FREQ

If you really need a single patient_id with multiple variables for each ICD value then you can do that in a single data step: 

/* Set this to the maximum number of icdVar values in your icd_codes variable */
/* Note you may need to set this dynamically depending on your data           */
%let icdVarCnt=13 ;

data patient_icd;
	/* Keep the variables you want in the output dataset */
	keep patient_id icd_var1-icd_var&icdVarCnt ;
	/* Create an array to hold the individual icd_var values */
	array icd_var{&icdVarCnt} $100 ;
	/* Read test data */
	input @1 Patient_ID $7. @8 ICD_CODES $600.;
	/* Loop through the icd_vsr variable extracting individual icd values */
	do i=1 to &icdVarCnt ; 
		/* extract individual icd values */ 
		ICD_VAR(i) = scan(ICD_CODES, i, '|') ;
		/* Drop out off the loop if there are no more individual icd values */
		if ICD_VAR(i) = '' then leave;
	end;
	/* write observation to output dataset */
	output work.patient_icd ;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
;
run;

Like others who have replied, I would not recommend this approach but only you know what your intended outcome is. 

Ksharp
Super User
data have;
infile datalines truncover;
input Patient_ID ICD_CODES $600.;
datalines;
234423 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Anxiety disorder, unspecified (F41.9) [POA=] | (#4) Osteonecrosis, unspecified (M87.9) [POA=] | (#5) Other symptoms and signs involving emotional state (R45.89) [POA=] | (#6) Other specified health status (Z78.9) [POA=] | (#7) Other specified counseling (Z71.89) [POA=] | (#8) Other long term (current) drug therapy (Z79.899) [POA=] | (#9) Encounter for therapeutic drug level monitoring (Z51.81) [POA=] | (#10) Counseling, unspecified (Z71.9) [POA=]
8723492 (#1) Sickle-cell disease without crisis (D57.1) [POA=] | (#2) Idiopathic aseptic necrosis of unspecified bone (M87.00) [POA=] | (#3) Osteonecrosis, unspecified (M87.9) [POA=] |
;
run;

data temp;
 set have;
 pid=prxparse('/\(.+?\)/');
 s=1;e=length(ICD_CODES);
 call prxnext(pid,s,e,ICD_CODES,p,l);
 do while(p>0);
  value=compress(substr(ICD_CODES,p,l),'()');
  if value ne: '#' then output;
  call prxnext(pid,s,e,ICD_CODES,p,l);
 end;
 keep Patient_ID ICD_CODES value;
run;
proc transpose data=temp out=want(drop=_NAME_) prefix=ICD_VAR;
by Patient_ID ICD_CODES ;
var value;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1197 views
  • 0 likes
  • 5 in conversation