Coding a Group Variable

Reply
Contributor
Posts: 41

Coding a Group Variable

I have a dataset consisting of multiple records per ID (patient). For each ID I want to code the "Group" variable with 0, 1, 2 or 9. The coding would be based on values in the "Comment" variable. Below are the possible values for the "Comments" variable:

TBI POS
CANCELLED BY CLINIC
NO-SHOW
NO ACTION TAKEN
CANCELLED BY PATIENT
COMPELTE

The dataset is sorted by id then CDATE. TBI POS will always be the first "Comment" value for each ID. Sometimes it may not, but should be ingored in this case. The "Group" variable should be coded according to the following:

For each ID (patient)
If TBI POS is the only field then "Group" = 0
If TBI POS immediately followed by COMPLETE then "Goup" = 1
IF TBI POS followed by anything except COMPLETE but eventually has COMPLETE then "Group" = 2.
IF TBI POS followed by anything except COMPLETE and COMPLETE nvever shows up then "Group" = 0.

After an ID is determined to be in Group 2 then all values after the COMPLETE should be coded "Group" = 9. So "Group" 9 is a subset of "Group" 2.

Below is example output
ID Comment Group
875 TBI POS 2
875 NO-SHOW 2
875 COMPLETE 2
886 TBI POS 1
886 COMPLETE 1
912 TBI POS 1
912 COMPLETE 1
912 UNSCHEDULED 9
931 TBI POS 0
946 TBI POS 0
946 CANCELLED BY PATIENT 0
946 CANCELLED BY CLINIC 0
946 NO ACTION TAKEN 0
946 NO-SHOW 0
1072 TBI POS 2
1072 NO-SHOW 2
1072 NO-SHOW 2
1072 COMPLETE 2
1086 TBI POS 1
1086 COMPLETE 1
1086 COMPLETE 9
1086 CANCELLED BY PATIENT 9
1086 COMPLETE 9
Super Contributor
Super Contributor
Posts: 3,174

Re: Coding a Group Variable

After reading and sorting your input, use a DATA step to input your sorted data, and assign your GROUP (based on using RETAIN statement variable), as your have explained. And if you know all possible COMMENT values, consider using a PROC FORMAT to read-up your input data, which would minimize your need for IF/THEN logic to assign GROUP.

Remember that PROC FORMAT can handle character-data range, such as your "starts with", but the VALUE statement or CNTLIN= coding is a bit tricky. If you do not have a large-volume of unique COMMENT strings, maybe the DATA step IF/THEN approach is sufficient, rather than the time required to setup a PROC FORMAT approach.

Scott Barry
SBBWorks, Inc.
Contributor
Posts: 41

Re: Coding a Group Variable

Thanks for the info. I'll give it shot.
SAS Super FREQ
Posts: 8,717

Re: Coding a Group Variable

Hi:
Scott's approach (with retain and data step and probably first. and last. variables) is one way to go.

I'd probably approach this in a different way -- you have long/skinny data. I would be tempted to use PROC TRANSPOSE to "flip" the data into WIDE data -- so that it would look like this:
[pre]
Obs id _NAME_ COL1 COL2 COL3 COL4 COL5

1 875 Comment TBI POS NO-SHOW COMPLETE
2 886 Comment TBI POS COMPLETE
3 912 Comment TBI POS COMPLETE UNSCHEDULED

[/pre]

That way, you could use an ARRAY statement in a DATA step program and make COL1-COL? the variables that composed the array. This would lead to the ability to have IF statements something like this:
[pre]
data setgroup (keep=ID comment group)
error_obs;

set transout; /* output from PROC TRANSPOSE */
ARRAY stmt for transposed comment variables;

... do loop to find out how many comments per ID, and set CNTR variable and also
set an indicator EVERCOMPLETE for whether there's ever a "COMPLETE"
in the comments...

* then have IF statements like these:;
if cmnt(1) = "TBI POS" then do;
if cntr = 1 then do;
group = 0;
/* If TBI POS is the only field then "Group" = 0 */
end;
else if cmnt(2) = "COMPLETE" then do;
group = 1;
/* If TBI POS immediately followed by COMPLETE
then "Group" = 1 */
end;
else if cmnt(2) ne "COMPLETE" then do;
if evercomplete = 1 then group = 2;
else if evercomplete = 0 then group = 0;
/*
IF TBI POS followed by anything except COMPLETE
... eventually has COMPLETE then "Group" = 2
... COMPLETE never shows up then "Group" = 0
*/
end;
end;
else if cmnt(1) ne "TBI POS" then output error_obs;

...then another do loop to write out the ID, the comment and the group
as a long skinny data set again. But the logic for the DO loop depends on
whether the 9 coding is applied to just group 2 or to group 1 and 2....
run;
[/pre]

I did find something inconsistent in your description of the coding and what you showed as your desired results -- you said that if there were comments after the group had been determined to be 2, then those comments should be coded as 9. But in your desired results above, you showed these 2 IDs as having a 9 coded after a GROUP of 1 had been assigned.
[pre]
ID Comment Group
912 TBI POS 1
912 COMPLETE 1
912 UNSCHEDULED 9


1086 TBI POS 1
1086 COMPLETE 1
1086 COMPLETE 9
1086 CANCELLED BY PATIENT 9
1086 COMPLETE 9

[/pre]

Also, you do not say whether there would EVER be anything other than TBI POS as the first comment, but if it's possible, then you might want to catch that ID as an error observation.

Another approach to think about -- since you do not know whether to code the FIRST obs for an ID as a 1 or a 0 or a 2 until you know what the value of the second comment is. There are ways around that, but this is a possible approach. And, you could just build an array in the DATA step without ever using PROC TRANSPOSE, too. It really depends on your comfort level with SAS programming.

cynthia
N/A
Posts: 0

Re: Coding a Group Variable

I think it could be done like this
data one(where=(last.id=1);
set yourdata;
by id;
if ~first.id and ~last.id then group=1;
lagid=lag(id);
lagc=lag(comment);
if id=lagid and comment~="complete" then group=lag(group);
if id=lagid and comment="complete" and lagc=TBI POS then group=lag(group)+1;
if id=lagid and comment="complete" and lagc=~TBI POS then group=2;
run;
data next;
merge yourdata new;
by id;
run;
data final;
set next;
by id;
lagid=lag(id);
lagc=lag(comment);
lagg=lag(group);
if group=2 and id=lagid and lagc="complete" then group=9;
if id=lagid and lagg=9 then group=9;
run;
I donot know whether it works , my idea is to use lag function.
Contributor
Posts: 41

Re: Coding a Group Variable

Thanks for the detailed explanations. I manually coded the groups since my dataset was so small but when I have some extra time I will try the suggested methods because I will run into this problem again.
Ask a Question
Discussion stats
  • 5 replies
  • 496 views
  • 0 likes
  • 4 in conversation