Re: Problem finding ICD-10 codes using arrays

LuisMijares

Hello, I have a question I'm not sure how to do this on SAS, I want to find instances of ICD-10, medical codes in the the variables DGNS_1_CD-DGNS_25_CD;

I made the following arrays, not exhaustive. I want to acomplish three things, 1. I want to create indicator variables for each disease, so a depression_indicator, nonalzhimers_indicator, etc, I want to make sure the codes for the other diseaseas are assigned 0, so in the depression_indicator codes for nonalzhimers, alzhimers and pneimonia will be 0 and I want to assign -1 to codes that I don't know, there might be codes for diseases that are not listed in the. I was trying to use the following code, any ideas would be helpful

array DGNS $ DGNS_1_CD -- DGNS_25_CD;
DEPRESSION_MEDPAR = .;
NONALZH_DEMEN_MEDPAR = .;
ALZH_MEDPAR = .;
PNEUMO_MEDPAR = .;
hf_medpar = .;

array depression_codes[50] $8 _temporary_

array nonalzhimers_codes[84] $8 _temporary_

array alzhimers_codes[4] $8 _temporary_

array pneumonia_codes[93] $8 _temporary_(

do i = 1 to dim(DGNS);
if strip(DGNS[i]) in depression_codes then DEPRESSION_MEDPAR=1;
* this would find the other codes and set equal to 0;

if (strip(DGNS[i]) in nonalzhimers_codes or strip(DGNS[i]) in pneumonia_codes or strip(DGNS[i]) in hf_codes
or strip(DGNS[i]) in prknsn_codes or strip(DGNS[i]) in stroke_codes
or strip(DGNS[i]) in stroke_exclusion_codes or strip(DGNS[i]) in anxiety_codes
or strip(DGNS[i]) in bipolar_codes or strip(DGNS[i]) in TBI_codes
or strip(DGNS[i]) in DRUG_USE_CODES or strip(DGNS[i]) in PSYCH_CODES
or strip(DGNS[i]) in OUD_CODES) then DEPRESSION_MEDPAR = 0;*/

Tom

I do not understand what you want. It kind of sounds like you want something impossible though. It sounds like you want allow only one of the disease indicators to be true for a patient. But it is not unreasonable for a single patient to have ICD codes indicating multiple different diseases.

Do you have some reason for wanting presence of depression to hide the fact that they also had a stroke? Perhaps you have some ranking of conditions and you only want the most serious?

LuisMijares

for example, for the indicator variable Depression_indicator I want it to equal 1 if there is a code for depression, the codes are in the arrays above, furthermore, I want it to equal 0 for cases when there are codes of other illnessses, so if there is a code for Alzhimers I want the Depression_indicator to equal 0, and the oppsoite for the Alzhimers indicator, I need the Alzhimers indicator to equal 1 if there are Alzhimers codes and 0 if there are depression codes.

Tom

Forget the arrays for now. Assume you have already checked and this particular observations has some DX codes that indicate DEPRESSION and some DX codes that indicate ANXIETY. How do you want that result coded? Do you want DEPRESSION=1 and ANXIETY=0 or do you want DPRESSION=0 and ANXIETY=1? Why?

LuisMijares

ID	dx1	dx2	Depression_indicator	Anxiety indicator
1	F0631	F064	1	0
2	F0631	F064	1	0
3	F064	F067	0	-1
4	f064	F0631	0	1

let's assume that the F0631 is a code for depression, FO64 is a code for anxiety and F067 is a unknown code, we don't know if it's depression or anxiety, I want the variables Depression_indicator and Anxiety_Indicator to look like the above.

quickbluefish

Sounds like you are trying to classify "reason for hospitalization" (in the US, more realistically, the justification for the bill) based on the primary diagnosis (dx1)? But you only want to make that determination if you can classify all the other codes into one of those bins (as you've defined them in the arrays) and otherwise basically indicate a possible diagnosis with a -1?

LuisMijares

yes, currently I have written this code, if you have any advice on how to fix it or implement what I need that would be appreciated

data have

set have

array DGNS $ DGNS_1_CD -- DGNS_25_CD;

DEPRESSION_MEDPAR = .;
do i = 1 to dim(DGNS);
if strip(DGNS[i]) in depression_codes then DEPRESSION_MEDPAR=1;
* this would find the other codes and set equal to 0;

if (strip(DGNS[i]) in nonalzhimers_codes or strip(DGNS[i]) in pneumonia_codes or strip(DGNS[i]) in hf_codes
or strip(DGNS[i]) in prknsn_codes or strip(DGNS[i]) in stroke_codes
or strip(DGNS[i]) in stroke_exclusion_codes or strip(DGNS[i]) in anxiety_codes
or strip(DGNS[i]) in bipolar_codes or strip(DGNS[i]) in TBI_codes
or strip(DGNS[i]) in DRUG_USE_CODES or strip(DGNS[i]) in PSYCH_CODES
or strip(DGNS[i]) in OUD_CODES) then DEPRESSION_MEDPAR = 0;*/
if missing(DGNS[i]) then DEPRESSION_MEDPAR = -9;

quickbluefish

I think you need to re-think the way you approach this - data structures like MedPAR always present this kind of annoyance.

I would first start by getting rid of the temporary arrays (the ones that hold all the codes) and instead put them all in a single format. The left side of the format is the dx code, and the right side is the condition, i.e.,

proc format;
value $fdxlab
  'F062', 
  'F0823', 
  'F0671' = 'DEPR'
  'X089',
  'X211' = 'ANX'
   ....
other='UNKNOWN'
   ;
run;
* now another format that just assigns a sequential number to each of the unique conditions - you can make these formats programmatically - I'm just typing them out manually ;

proc format;
value $fcondnum
  'UNKNOWN'=1
  'DEPR'=2
  'ANX'=3
  'ALZ'=4
  'CHF'=5
   ... ;
;
run;
** now with your medpar data, create new dummy variables for each of those conditions -- same order as above - again, ideally, don't do this manually ;
data mp;
set mp;
length dx_UNKNOWN dx_DEPR dx_ANX dx_ALZ dx_CHF 3;
array dx {*} dx_:;  * your new dummy variables ;
array dg {*} dgns_1_cd -- dgns_25_cd;  * your existing diagnosis codes ;
** now classify each of the 25 diagnosis codes for this hospitalization ;
do i=1 to dim(dx);
    dx[i]=0;
end;
do i=1 to dim(dg);
    dx[put(put(dg[i],$fdxlab.),$fcondnum.)*1]=1;
end;
drop i;
run;

I think that will get you pretty close to what you're looking for. If you really care about what's in the primary position (DGNS_1_CD), then you'd need to add a step for that that specifically classifies that variable, but that would be simple.

mkeintz

@LuisMijares wrote:

ID dx1 dx2 Depression_indicator Anxiety indicator

1 F0631 F064 1 0

2 F0631 F064 1 0

3 F064 F067 0 -1

4 f064 F0631 0 1

let's assume that the F0631 is a code for depression, FO64 is a code for anxiety and F067 is a unknown code, we don't know if it's depression or anxiety, I want the variables Depression_indicator and Anxiety_Indicator to look like the above.

Is the understanding below correct?

When DX1 is a known code, the corresponding dummy variable is either 1 or -1 (all other dummies are zero). The dummy is 1 when DX2 is some other known code and is a -1 if DX2 is unknown.

Question: what if DX1 is an unknown code, and DX2 is a known code?

And what if both DX1 and DX2 are unknown codes?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick

Most of the time a long data structure beats a wide one. Below not exactly what you asked for but just some sample code to illustrate what I'm talking about.

data have;
  input ID $ dx1 $ dx2 $ Depression_indicator Anxiety_indicator;
  datalines;
2 F0631 F064 1 0
3 F064 F067 0 -1
4 f064 F0631 0 1
1 F0631 F064 1 0
;
run;

proc format;
  value $icd_groups
    'F0631' ='Depression'
    'F064'  ='Anxiety'
    other   ='unknown'
    ;
run;

proc transpose data=have out=long(rename=(_name_=dx col1=icd));
  by id notsorted;
  var dx:;
run;

data want;
  set long;
  icd_group=put(icd,$icd_groups.);
run;

proc print data=want;
run;

proc report data=long;
  columns id icd, dx;
  define id / group;
  define dx / across;
  define icd / group;
  format icd $icd_groups.;
  label dx=' ';
run;

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away