Creating 2574 seems to be a bad idea, but maybe this is what you need for the next steps. I doubt it. Creating only one category variable and multiple observations could even reduce the code in subsequent steps. In this case, all you would need is something like
category = substr(dc[i], 1, 3);
output;
inside the loop.
If you have to create those variables (code is untested):
data ICD_flags;
set ICD;
array dc diagnosis ediag1-ediag20 ecode1-ecode4;
length _code_list $ 11000 _code $ 3;
retain _code_list;
drop _code_list _code i j;
/* fill _code_list, t */
if _n_ = 1 then do;
do i = 65 to 65+26;
do j = 1 to 99;
_code = cats(byte(i), put(j, z2.));
_code_list = catx(' ', _code_list, _code);
end;
end;
end;
array flags A01-A99 B01-B99 C01-C99 /* !! Won't work without adding all ranges here ... Z01-Z99 */;
do i = 1 to dim(dc);
if not missing(dc[i]) then do;
j = findw(_code_list, substr(dc[i], 1, 3), ' ', 'e');
if j > 0 then do;
flags[j] = 1;
end;
end;
end;
RUN
Idea: The _code_list contains the variable names in the same order they have in the array flags, the function findw with option "e" returns the position of the first three chars of the icd in _code_list which is the index of the flag variable in the array.
... View more