Solved: Using an array to retain max value by ID

dominiquec · Posted 12-19-2023 01:46 PM

I am working with patient hospitalization data, which includes a series of diagnosis code flags for each hospitalization. In our study design, patients have different exposures during different periods of time (exposure_type). I want to summarize the hospitalizations within each exposure type for each patient by creating an 'ever diagnosed flag' for each of the diagnosis codes in the study. This 'ever diagnosed flag' will indicate if a patient ever had that diagnosis code associated with any of their visits during each exposure period. This is my code so far:

data want;
set have;
by patient_id exposure_type;

array oldflags {20} diag1-diag20;
array overallflg {20} everdiag1-everdiag20;

if first.patient_id and first.exposure_type then do i = 1;
overallflg[i] = 0;
end;

do i = 1 to 20;
overallflg[i] = max(oldflags[i]);
end;
keep patient_id exposure_type everdiag1-everdiag20;
run;

When I run this above code, the max value for each diagnosis flag isn't carrying over. everdiag1-everdiag20 are basically copies of the original flags, and the max value within each unique ID and exposure date is not retained. What am I missing? Thank you for your time.

PaigeMiller · Posted 12-19-2023 02:47 PM

If I understand you properly (and I'm not 100% sure that I do understand you properly), then this doesn't require arrays at all. This is a job for some procedure that can look at different rows for each patient_id and exposure_type, and find out where the values of 1 exist.

In other words, this is a job for PROC SUMMARY (the SUPERMAN of SAS data aggregation™).

data have;
    input patient_id exposure_type hosp_dt :mmddyy10. diag1 diag2 diag3 diag4;
cards;
100100	1	01.01.1995	0	1	0	1
100100	1	06.15.1995	1	0	0	1
100100	2	10.01.1995	1	0	0	0
100100	2	12.15.1995	1	0	1	0
;

proc summary data=have nway;
    class patient_id exposure_type;
    var diag1-diag4;
    output out=want(drop=_:) max=everdiag1-everdiag4;
run;

Now, this output data set named WANT has only one row per patient_id and exposure_type. You could merge it back into the original data set if you want multiple rows per patient_id and exposure_type, as shown in your desired output. If you want to do that, I leave that to you to figure out what the code is.

--
Paige Miller

View solution in original post

PaigeMiller · Posted 12-19-2023 01:53 PM

It would be very helpful if you could show us a portion of the actual data, or even fake data arranged to be in a similar arrangement to the actual data. And then it would also be very helpful if you showed us the incorrect output and showed us what the correct output ought to be.

--
Paige Miller

dominiquec · Posted 12-19-2023 02:40 PM

Certainly!

An example of what the data currently look like is below:

patient_id	exposure_type	hosp_dt	diag1	diag2	diag3	diag4
100100	1	01.01.1995	0	1	0	1
100100	1	06.15.1995	1	0	0	1
100100	2	10.01.1995	1	0	0	0
100100	2	12.15.1995	1	0	1	0

Right now, my results look like a copy of the original file:

patient_id	exposure_type	everdiag1	everdiag2	everdiag3	everdiag4
100100	1	0	1	0	1
100100	1	1	0	0	1
100100	2	1	0	0	0
100100	2	1	0	1	0

For patient 100100, the everdiag for diag1 should be 1 for both exposure types 1 and 2, as at least one hospital visit had that diagnosis associated with it during each exposure type. However, the everdiag flag for diag3 should only be 1 for only exposure type 2, as this diagnosis only appeared during a hospitalization when the patient was in exposure type 2.

See below for how I would like the final dataset to look like. All the everdiag flags should look the same within each unique patient ID and exposure type:

patient_id	exposure_type	everdiag1	everdiag2	everdiag3	everdiag4
100100	1	1	1	0	1
100100	1	1	1	0	1
100100	2	1	0	1	0
100100	2	1	0	1	0

PaigeMiller · Posted 12-19-2023 02:47 PM

If I understand you properly (and I'm not 100% sure that I do understand you properly), then this doesn't require arrays at all. This is a job for some procedure that can look at different rows for each patient_id and exposure_type, and find out where the values of 1 exist.

In other words, this is a job for PROC SUMMARY (the SUPERMAN of SAS data aggregation™).

data have;
    input patient_id exposure_type hosp_dt :mmddyy10. diag1 diag2 diag3 diag4;
cards;
100100	1	01.01.1995	0	1	0	1
100100	1	06.15.1995	1	0	0	1
100100	2	10.01.1995	1	0	0	0
100100	2	12.15.1995	1	0	1	0
;

proc summary data=have nway;
    class patient_id exposure_type;
    var diag1-diag4;
    output out=want(drop=_:) max=everdiag1-everdiag4;
run;

Now, this output data set named WANT has only one row per patient_id and exposure_type. You could merge it back into the original data set if you want multiple rows per patient_id and exposure_type, as shown in your desired output. If you want to do that, I leave that to you to figure out what the code is.

--
Paige Miller

dominiquec · Posted 12-19-2023 03:01 PM

Thank you, this gave me exactly what I needed! I definitely overcomplicated this and need to practice more with proc summary.

ballardw · Posted 12-19-2023 02:45 PM

I would expect to see comparing if a value you have matches some other values, such as list of actual codes. I don't see any such comparison.

And when you use

do i = 1 to 20;
overallflg[i] = max(oldflags[i]);
end;

this is exactly the same as

do i = 1 to 20;
overallflg[i] = oldflags[i];
end;

because you are asking for the Max of exactly one value, oldflags[i];

If you want to use the largest value of an array, which doesn't make much sense to me in terms of your described problem the code would be

do i = 1 to 20;
overallflg[i] = max( of oldflags(*));
end;

but that sets all of the overallflg to the same value. Which is why I don't think that is what you actually want either.

Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Catch up on SAS Innovate 2026

Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Re: Using an array to retain max value by ID

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away