I have a long dataset with a 36-char STRING variable by ID and CLASS level. Some ID's CLASS values have multiple STRING values and I need to collapse the STRING by unique ID-CLASS group. Each character in the string represents a month and values should be collapsed as follows: if month X is 0 across all STRING within an ID-CLASS group, then month X should be 0 in final string; if month X is a 1 in any STRING within an ID-CLASS group, then month X should be 1; and if there are only 2s or 2s and 0s in month X in all STRING within an ID-CLASS group, then month X should be 2.
So for ID #1002, the final data should contain one row for CLASS #973, with the same info as before, and only one row for CLASS #934 where the STRING would be 010122111110000000000000000000000000.
ID
CLASS
STRING
10002
973
000000000000011111000000000000111110
10002
934
010122121110000000000000000000000000
10002
934
020222111100000000000000000000000000
I tried creating 36 month-level indicators based on the string and using them below. It's working in some cases, but not in others. I think when an earlier string contains a 2 and a later one has a 1, it updates to 1 correctly. But when a 1 appears in an earlier string and a 2 in a later string, it changes the 1 to 2.
data dsn;
length newstring $36;
set dsn;
by id class;
array mstr (36) STR01-STR36;
array nstr (36) NSTR01-NSTR36;
do i=1 to 36;
if first.unitid then nstr[i] = 0;
end;
retain nstr01--nstr36;
do i=1 to 36 ;
if mstr[i] > 0 then nstr[i] = min(of mstr[i]);
end;
newstring = cats(of nstr:);
run;
I'd appreciate any tips or suggestions. Thanks.
... View more