Solved: Using two layers of arrays in DO-loops

Malthe · Posted 11-06-2020 05:46 AM

Hi forum!

I have to create a medical score depending on several categories; i.e. cerebrovascular disease, cardiac disease, neurological disease. Each of these diseases have several ICD10-codes connected to them. So for instance 'Cerebrovascular Disease' could have 'DG45', 'DG46' etc.

I have a huge look-up table and i would like to look this through using two arrays. One array containing the categories and several arrays containing the ICD-10 codes.

I'm having trouble using this two-layered array-approach, can you help me? This is a much simplified version of my dataset and categories.

data have;
	input id disease $;
	cards;
	1 DG451
	2 DG313
	3 DG462
	4 DG461A
	5 DI14
	6 DI13B
	7 DI69
	8 DI141
;
run;

data lookup;
	set have;
	array diseasecategory1 {2} $4 _temporary_ ('DG45' 'DG46');
	array diseasecategory2 {1} $4 _temporary_ ('DI14');
    /* IN THE REAL WORLD I HAVE CATEGORIES UP TO 13 AND THEY CAN EACH CONTAIN UP TO 30 VALUES */


	array categoryindex {2} $25 _temporary_ ('diseasecategory1' 'diseasecategory2'); /*THIS WOULD THEN NEED TO GO ON TO 13 */ 

	do i = 1 to dim(categoryindex);
		do c = 1 to dim(categoryindex{i});
			if find(disease,strip(cat(categoryindex{i},'{',c,'}'))) > 0 then counter +1;
		end;
	end;
run;

The problems are as follows:
- In the second DO-loop i can't use the DIM-function on the value of the categoryindex-array

- In the FIND-function i can't combine the value of categoryindex-array and the counting variable of the do loop.

I hope you can help me 🙂

***EDIT***

Thanks for replying Paige Miller, it seems i need to elaborate a bit more. I'll try my best.

So with my real world data i have 13 disease categories. Each category can have up to 30 different values. So that would be the equivalent of:

array diseasecategory1...13 {4...30} $12 _temporary_ ('DGX1' 'DGX2' 'DIX4' ..... DZ30');

And then i would use the DO-loop to add 1 to a counting variable. So for instance if a patient had one disease in three diseasecategories, their final score would be 3.

*** EDIT 2 ***

What i want:
So I want to calculate a comorbidity score based on the diagnoses the patient has in the history. So for each patient i look through their entire history of diagnoses, if they match one in one of the categories, the score goes up by 1. I thought it would confuse more than clarify:

data patientlist;
	input patientid;
	cards;
	1
	2
	3
	4
	5
;
run; 

data diagnosishistory;
	input patientid icdcode $;
	cards;
	1 DG451
	1 DG313
	1 DG462
	1 DG461A
	2 DG45
	2 DI13B
	2 DI49
	3 DI151
;
run;

data lookup;
	if _N_ = 1 then do;
		if 0 then set diagnosishistory;
			dcl hash pat (dataset:'diagnosishistory',multidata:'y');
			pat.definekey('patientid');
			pat.definedata('ICDCode');
			pat.definedone();
		end;

	set patientlist;
	by patientid;
	retain SCORE lock; 

	if first.patientid then do;
		call missing(score, lock);
	end; 

	array diseasecategory1 {2} $4 _temporary_ ('DG45' 'DG46');
	array diseasecategory2 {2} $4 _temporary_ ('DI14' 'DI49');

	array categoryindex {2} $25 _temporary_ ('diseasecategory1' 'diseasecategory2');

	do _iorc_ = pat.find() by 0 while (_iorc_ = 0);
		do i = 1 to dim(categoryindex);
			do c = 1 to dim(categoryindex{i});
				if find(disease,strip(cat(categoryindex{i},'{',c,'}'))) > 0 then SCORE + 1;
			end;
		end;

		_iorc_ = pat.find_next();
	end;
run;

So in this example, my desired output would be like:
Patient 1, score = 3 (Because he has one diagnosis with DG45 and two with DG46 in his history)

Patient 2, score = 2 (Because he has one DG45 and one DI49 in his history)
Patient 3, score = 0 (Because she hasn't got any of the diagnoses in history)

Tom · Posted 11-06-2020 10:37 AM

Not sure this is the right way to handle the actual business problem, but here is a technical answer for how to do something like this. Just use a 2-D array.

data have;
	input patientid icdcode $;
cards;
1 DG451
1 DG313
1 DG462
1 DG461A
2 DG45
2 DI13B
2 DI14xxxx
2 DI49
3 DI151
;


%let ncats=2;

data lookup;
do until(last.patientid);
	set have;
  by patientid;

  /* IN THE REAL WORLD I HAVE CATEGORIES UP TO 13 AND THEY CAN EACH CONTAIN UP TO 30 VALUES */
  array catcodes [&ncats,30] $4 _temporary_ 
    ('DG45' 'DG46' 28*' '
     'DI14' 29*' '
    );
  array category [&ncats] ;
	do cat = 1 to dim(category);
		do i = 1 to dim(catcodes,2) while(catcodes[cat,i] ne ' ');
			if catcodes[cat,i]=: icdcode then category[cat]=1;
		end;
	end;
end;
  drop icdcode cat i
run;

Obs    patientid    category1    category2

 1         1            1            .
 2         2            1            1
 3         3            .            .

View solution in original post

PaigeMiller · Posted 11-06-2020 05:55 AM

dim(categoryindex{i})

I don't know what you are trying to do here, or why you have programmed it this way. The DIM function works on arrays, it doesn't work on the i-th element of an array. Even if this syntax would work, the dimension of the i-th element of an array is always 1. What did you intend to have happen from this part of the code?

--
Paige Miller

Malthe · Posted 11-06-2020 06:15 AM

Thanks for answering! I think an example is best for me to try and explain it:

The way I intended for it to work:
1) Value of Categoryindex{1} = 'diseasecategory1' (which is the name of another array)
2) The dim would then be a function of 'diseasecategory1'. DIM(Diseasecategory1)
3) Then i would have the number of elements in Diseasecategory1. In this instance 2 ('DG45' 'DG46'). So my new counting variable C would then go from 1 to 2.
4) I would then be able to use these values in the findstring.
So i start by using step 1 | Categoryindex{1} = 'diseasecategory1'
Then i would use my new C-variable in the do loop | C = 1
I could then put these together in the cats-function so i ended up having the value | diseasecategory1{1}
5) I would then be able to do a look-up of the first value in diseasecategory1{1} = 'DG45' in my original dataset.

Then i would be able to repeat the process for diseasecategory1{2} (with C = 2). After this was done i would then in the loop go to diseasecategory2 .. and so o

PaigeMiller · Posted 11-06-2020 06:25 AM

For the case where you have only two values in categoryindex, you could simply use IF statements.

if c=1 then z=dim(diseasecategory1);
else if c=2 then z=dim(diseasecategory2);

If the real-world problem actually has many (instead of 2) disease categories, then using the IF statements as above isn't viable, and you'd have to find a different approach. Off the top of my head, I'm not sure I have a good recommendation at this time — that does not mean there is not a good approach, it means I can't think of one now.

--
Paige Miller

PaigeMiller · Posted 11-06-2020 06:28 AM

Also, please show us the desired output from this data.

--
Paige Miller

Malthe · Posted 11-06-2020 06:40 AM

Unfortunately with my real-world data I have a lot more diseasecategories. I have elaborated in the original post 🙂 I hope this helps!

PaigeMiller · Posted 11-06-2020 07:09 AM

Looking back where I said

if c=1 then z=dim(diseasecategory1);
else if c=2 then z=dim(diseasecategory2);

and knowing that you have lots of diseasecategory levels instead of just 2, this seems to me to be a case where a macro loop would be a good solution, unless someone can think of a way to do this without a macro loop. But as I said, you need to show us the desired output from this DATA step on your sample data.

--
Paige Miller

andreas_lds · Posted 11-06-2020 06:56 AM

Why is it necessary to have an array for each category?

From the data you have posted: what do you expect as result?

ballardw · Posted 11-06-2020 07:17 AM

SAS provides a very efficient "look up" method called a FORMAT.

A very brief example:

proc format;
value $dxcode
'DG462','DG461A' = 'DX group 1'
'DI14','DI141'  = 'DX group 2'
/* continue assigning codes to groups*/
;
run;

data have;
	input id disease $;
	cards;
	1 DG451
	2 DG313
	3 DG462
	4 DG461A
	5 DI14
	6 DI13B
	7 DI69
	8 DI141
;
run;

proc freq data=have;
  tables disease;
  format disease $dxcode.;
run;

If your actual data has multiple diagnosis codes per record then one way to count, which seems to be part of this exercise, is to transpose the data so you have one record per code and then use something like Proc Freq with the format applied.

One of the nice things to the format approach is that you can use the formats to create groups in almost every SAS procedure.

This is not the first or probably even the 10th time this ICD to text has been on this forum.

I just did a search just in the SAS Programming section for ICD-10 and found 47 responses. Likely there are more with ICD10 or other spellings, plus ICD-9 which would show similar techniques even though the codings changed.

Unfortunately for a "standard" people seem to have way too many ways of storing ICD values so other code may not work exactly or the were interested in different topics, but the principals are the same.

Malthe · Posted 11-06-2020 07:50 AM

I have tried to add yet another description to clarify what i mean

(I'm sorry, but i'm really trying to phrase it as easily understandable as i can, thanks for all your help)

I will look into the formatting method and see if i can apply that 🙂

Cynthia_sas · Posted 11-06-2020 10:29 AM

Hi:

Depending on the structure of the data, PROC TABULATE might produce the type of score you describe. With this fake data, and using the format for lookup approach, I have 2 patient_ids that fall into both format groups and 2 patient_ids that are just in one group:

TABULATE can give you a score or count for each patient for each group and then an overall score or count across all groups.

Cynthia

ballardw · Posted 11-06-2020 10:51 AM

@Malthe wrote:
I have tried to add yet another description to clarify what i mean

(I'm sorry, but i'm really trying to phrase it as easily understandable as i can, thanks for all your help)

I will look into the formatting method and see if i can apply that 🙂

One thing to consider with the Format approach: When your boss comes along and wants to add another group of interest you pretty much only need to add the new values to the format and most of your other code needs no changes. If you are coding loops then where/now to change the index values may be a non-trivial operation.

Tom · Posted 11-06-2020 10:37 AM

Not sure this is the right way to handle the actual business problem, but here is a technical answer for how to do something like this. Just use a 2-D array.

data have;
	input patientid icdcode $;
cards;
1 DG451
1 DG313
1 DG462
1 DG461A
2 DG45
2 DI13B
2 DI14xxxx
2 DI49
3 DI151
;


%let ncats=2;

data lookup;
do until(last.patientid);
	set have;
  by patientid;

  /* IN THE REAL WORLD I HAVE CATEGORIES UP TO 13 AND THEY CAN EACH CONTAIN UP TO 30 VALUES */
  array catcodes [&ncats,30] $4 _temporary_ 
    ('DG45' 'DG46' 28*' '
     'DI14' 29*' '
    );
  array category [&ncats] ;
	do cat = 1 to dim(category);
		do i = 1 to dim(catcodes,2) while(catcodes[cat,i] ne ' ');
			if catcodes[cat,i]=: icdcode then category[cat]=1;
		end;
	end;
end;
  drop icdcode cat i
run;

Obs    patientid    category1    category2

 1         1            1            .
 2         2            1            1
 3         3            .            .

Cynthia_sas · Posted 11-06-2020 11:16 AM

Hi:
This user group paper https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2019/3117-2019.pdf has some good examples. I searched at www.lexjansen.com to find papers related to using ICD codes. This older paper has some good examples of creating user-defined formats for lookups, including a simplified ICD code example: https://support.sas.com/resources/papers/proceedings/pdfs/sgf2008/041-2008.pdf .
Cynthia

Malthe · Posted 11-09-2020 06:56 AM

Thanks for all your help, all of you! It was hard choosing a 'right solution', but this one worked flawlessly for me.

The Proc Format/Proc Tabulate/Freq didn't really work as i have 10.000+ patients, and that methodology was simply too slow.

One little note though; If i made multiple arrays, and made a do-loop for each of them (i.e. ugly coding with a lot of lines), it was 60% faster than the solution above.

Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

Re: Using two layers of arrays in DO-loops

SAS Innovate 2025: Call for Content

Classroom Training Available!