Hi:
Scott's approach (with retain and data step and probably first. and last. variables) is one way to go.
I'd probably approach this in a different way -- you have long/skinny data. I would be tempted to use PROC TRANSPOSE to "flip" the data into WIDE data -- so that it would look like this:
[pre]
Obs id _NAME_ COL1 COL2 COL3 COL4 COL5
1 875 Comment TBI POS NO-SHOW COMPLETE
2 886 Comment TBI POS COMPLETE
3 912 Comment TBI POS COMPLETE UNSCHEDULED
[/pre]
That way, you could use an ARRAY statement in a DATA step program and make COL1-COL? the variables that composed the array. This would lead to the ability to have IF statements something like this:
[pre]
data setgroup (keep=ID comment group)
error_obs;
set transout; /* output from PROC TRANSPOSE */
ARRAY stmt for transposed comment variables;
... do loop to find out how many comments per ID, and set CNTR variable and also
set an indicator EVERCOMPLETE for whether there's ever a "COMPLETE"
in the comments...
* then have IF statements like these:;
if cmnt(1) = "TBI POS" then do;
if cntr = 1 then do;
group = 0;
/* If TBI POS is the only field then "Group" = 0 */
end;
else if cmnt(2) = "COMPLETE" then do;
group = 1;
/* If TBI POS immediately followed by COMPLETE
then "Group" = 1 */
end;
else if cmnt(2) ne "COMPLETE" then do;
if evercomplete = 1 then group = 2;
else if evercomplete = 0 then group = 0;
/*
IF TBI POS followed by anything except COMPLETE
... eventually has COMPLETE then "Group" = 2
... COMPLETE never shows up then "Group" = 0
*/
end;
end;
else if cmnt(1) ne "TBI POS" then output error_obs;
...then another do loop to write out the ID, the comment and the group
as a long skinny data set again. But the logic for the DO loop depends on
whether the 9 coding is applied to just group 2 or to group 1 and 2....
run;
[/pre]
I did find something inconsistent in your description of the coding and what you showed as your desired results -- you said that if there were comments after the group had been determined to be 2, then those comments should be coded as 9. But in your desired results above, you showed these 2 IDs as having a 9 coded after a GROUP of 1 had been assigned.
[pre]
ID Comment Group
912 TBI POS 1
912 COMPLETE 1
912 UNSCHEDULED 9
1086 TBI POS 1
1086 COMPLETE 1
1086 COMPLETE 9
1086 CANCELLED BY PATIENT 9
1086 COMPLETE 9
[/pre]
Also, you do not say whether there would EVER be anything other than TBI POS as the first comment, but if it's possible, then you might want to catch that ID as an error observation.
Another approach to think about -- since you do not know whether to code the FIRST obs for an ID as a 1 or a 0 or a 2 until you know what the value of the second comment is. There are ways around that, but this is a possible approach. And, you could just build an array in the DATA step without ever using PROC TRANSPOSE, too. It really depends on your comfort level with SAS programming.
cynthia