Hi All, I want to determine the proportion of each ICD10 category that appears in a dataset of approx 100,000 records containing 25 diagnosis variables (character), and compare these between exposed and non-exposed individuals. This would allow me to determine, for example, X% of exposed people have F10 diagnosis and X% of non-exposed people have F10 diagnosis (or any other diagnosis of interest). I'm currently doing this by writing a line of code for each new variable (A to Z by 1 to 99) which equates to 26x99 = 2574 new ICD variables (see code except below). This method is proving to be very slow and take a lot of processing time! data ICD_flags;
set ICD;
array dc{25} diagnosis ediag1-ediag20 ecode1-ecode4;
/* flag occurrences of all ICD categories */
DO i=1 to 25;
if dc{i} in : ('A01') then A01=1;
if dc{i} in : ('A02') then A02=1;
if dc{i} in : ('A03') then A03=1;
if dc{i} in : ('A04') then A04=1;
if dc{i} in : ('A05') then A05=1;
if dc{i} in : ('A06') then A06=1;
if dc{i} in : ('A07') then A07=1;
if dc{i} in : ('A08') then A08=1;
if dc{i} in : ('A09') then A09=1;
if dc{i} in : ('A10') then A10=1;
if dc{i} in : ('A11') then A11=1;
if dc{i} in : ('A12') then A12=1;
if dc{i} in : ('A13') then A13=1;
if dc{i} in : ('A14') then A14=1;
if dc{i} in : ('A15') then A15=1;
if dc{i} in : ('A16') then A16=1;
...
...
if dc{i} in : ('Z97') then Z97=1;
if dc{i} in : ('Z98') then Z98=1;
if dc{i} in : ('Z99') then Z99=1;
END;
RUN; Is there a more efficient way of doing this? I thought an alternative could be creating a new two dimensional array where each cell represents a diagnosis category. For example, each row represents ICD letters A to Z, and each column represents code categories 1 to 99. But I don't know how to do this. I hope this makes sense, thanks in advance!
... View more