Solved: Re: Making groups for different outcomes

rookie21 · Posted 12-04-2021 11:46 AM

Hi,

I have a column with subjects (called record_id) and a list of variables (called lead_1, lead_2, lead_3 and soforth).

For the variable lead_x, its only possible to have the values: 1, 2, 99 or none present.

I would like to make groups of subjects and count these groups, according to

- how many is presented with a value "1" in all lead_types (from left to right),

- how many with the value 2 in all lead_types,

- how many is starting with 1 in som lead_types and then changes to lead_2 and vice verca

- how many is starting in group 99, changing to 1 and later 2

- and so on for possible variation outcomes

In summary I would like the number of subjects in all possible variations. Its only interesting for me, WHEN they change group

I tried this (first part to extract the needed information in one set called gr and then proc freq to see the outcomes and number in each group):

data gr; 
set lead_c; 
group = cat(lead_1,lead_2,lead_3,lead_4,lead_5,lead_6,lead_7,lead_8,lead_9,lead_10,lead_11,lead_12,lead_13,lead_14,lead_15);
run; 

proc freq data=gr; 
	tables group; 
run;

which gives me (part is shown below):

Here I would like SAS to read the previous number and only classify the ID when they change number.

I think I need som retain statement and a way to imply to ignore, if empty to get free of all those "...."

Anyone who can help how to get closer to the solution ?

PaigeMiller · Posted 12-05-2021 05:13 PM

data want;
    set have;
    length group $ 256;
    array lead lead:;
    group=cats(lead(1));
    do i=2 to dim(lead);
        if lead(i)^=lead(i-1) then do;
            group=cats(group,'->',lead(i));
        end;
    end;
    drop i;
run;

--
Paige Miller

View solution in original post

PaigeMiller · Posted 12-04-2021 12:00 PM

We can't work with data that is a screen capture. We need (a portion of) your data presented in SAS data step code (instructions) and not via any other method or format.

--
Paige Miller

rookie21 · Posted 12-04-2021 12:14 PM

I made a miniature version of dataset

Hope this works out

data have;
input s_id lead_1 lead_2 lead_3 lead_4 lead_5 lead_6 lead_7;
datalines;
1 1 1 1 1 1 1 1
2 1 1 1 3 3 3 3
3 1 1 1 1 1 2 2
4 1 2 2 2 2 2 2
5 1 1 1 1 2 2 2
6 1 1 1 1 1 1 1
7 2 2 2 2 2 1 2
8 2 1 2 2 2 2 2
9 3 1 1 1 3 2 1
10 2 2 2 1 2 2 3
11 3 99 1 3 3 3 2
;
run;

data want;
set have;
conc = cat(lead_1,lead_2,lead_3,lead_4,lead_5,lead_6,lead_7);
run;

proc freq data=want;
tables conc;
run;

PaigeMiller · Posted 12-04-2021 12:20 PM

You start by saying:

For the variable lead_x, its only possible to have the values: 1, 2, 99 or none present.

Now you present data where lead_1 = 3, explain this please.

Also, if I am understanding your original description properly, the groups you want to create are overlapping, an ID can be in multiple groups. Is that correct?

--
Paige Miller

rookie21 · Posted 12-04-2021 01:14 PM

Sorry, to clarify,
ID are the numbers present in the picture (from 68 to 94). These are persons.
So person nr 1 in my cohort is record nr 1, and person number 94 has the record id 90.

One person can change groups (the lead numbers). Lead numbers represents operations types. My main goal now is to calculate how many patients were operated with the same type of operation (represented as 1 - 1- 1- 1 or 2- 2- 2- 2) and how many people changed operation type during life time (represented as 1- 1- 2 or 2- 2- 2- 2- 1)
In total I have actually 4 operations variations - presentes as variable names 1, 2, 3, 99 in lead_types.

So lets say person nr 70 from the original picture I posted. This person only has 1 present in all leads numbers. This person will go in the group called only nr 1.
Person 79 starts in 2 (lead_1) and changes to 1 (in lead_2).

Person 70 will be continously in group 1. This is one outcome.
Person 79 will be in the group called 2-1 as this person changes.

As for that, I would like to count:
how many patient are only operated with type 1 operations during life tim e
how many only type 2
how many only type 3
how many only type 99
Then for combinations the same as fx,
how many have the combination of operation type 1 first couple of times and then changes to two and vice verca

Does it make sense now

PaigeMiller · Posted 12-04-2021 02:05 PM

@rookie21 wrote:

So lets say person nr 70 from the original picture I posted. This person only has 1 present in all leads numbers. This person will go in the group called only nr 1.
Person 79 starts in 2 (lead_1) and changes to 1 (in lead_2).

Person 70 will be continously in group 1. This is one outcome.
Person 79 will be in the group called 2-1 as this person changes.

As for that, I would like to count:
how many patient are only operated with type 1 operations during life tim e
how many only type 2
how many only type 3
how many only type 99
Then for combinations the same as fx,
how many have the combination of operation type 1 first couple of times and then changes to two and vice verca

Does it make sense now

Not completely.

Please enumerate all possible groups that a person can wind up in.

--
Paige Miller

rookie21 · Posted 12-04-2021 05:41 PM

Ok,

Im definitely not good to present my challenge. I try again.. Thank u so much for still trying to help me

Record_id is equal to a person.

Lead_x is equal to operation. (lead_1 presenting first operation, lead_2 presenting 2nd operation, lead_3 presenting 3rd operation and so on).

Some patients only had 1 operation, and some had multiple.
In lead_x its possible to have 4 different types of operation presented as nr 1, 2 or 3 and 99 is unknown. So in total 4 different types of operation.

For each patient I would like to know, whether they are in the group:
- only 1 in all lead_types

- only 2 in all lead_types

- only 3 in all lead_types

- only_99 in all lead_types

for which all 4 above mentioned is that they only hadd same type of operation during life.

However, a patient can have been operation with 1 for the first operations and afterwards changed to 2, 3, or 99. This goes for all combinations.

So possible groups for each patient consists of:

1 --> 1

2--> 2

3 --> 3

99 --> 99

1 -> 2

1 -> 3

1-->99

2--> 1

2-->3

2 -->99

3 --> 1

3 --> 2

3 --> 99

99 --> 1

99 --> 2

99 --> 3

Lets say patient 1 has following:

lead_1 lead_2 lead_3

1 1 2

Then this patient should be categorized as: "1-->2 group".

So only the change in operation type is of interest for me without how many of each .

Hope this makes more sense

ballardw · Posted 12-04-2021 01:44 PM

With a small data set show what you expect for output as well. It is pretty obvious that your "Want" data set in your example code is not what you actually want given that you asking for help.

Note that unless multiple observations are to be combined somehow then providing one of each type of case is what would be needed.

Include missing values since your first example had a lot of missing. For datalines if you enter . it will be read as a missing value.

You may want to consider exactly what you mean by "In summary I would like the number of subjects in all possible variations." and provide a more explicit description. Your initial picture showing variable names up to Lead_13 with 4 possible values, missing, 1, 2 and 99 means that there are 67108864 possible permutations (4 to the 13th power). If you have more variables that number of permutations goes up quickly.

It is easy to test if all of a bunch of variables has the same value excluding missing, the RANGE function returns 0 in that case (all 1, all 2 or all 99, a result of missing means all missing).

Range returns the difference of the smallest to largest value, so a return of 1 means the difference is 1 (which would indicate only 1 and 2 plus any number of missing in your original picture), 98 would mean combinations of at least one 99 and at least one 1 (might include 2), on the same observations, 97 would mean 99 and at least one 2 and no 1.

MAX, or Min combined with Range tells what value was present for all populated variables.

The N function returns how many variables have values..

Some example:

data have;
   input s_id lead_1 lead_2 lead_3 lead_4 lead_5 lead_6 lead_7;
   array l (*) lead_:;
   r = range(of l(*));
   m = max(of l(*));
   ones = (m=1 and r=0) ;
   twos = (m=2 and r=0) ;
   all_ones= (ones and n(of l(*))=dim(l));
   all_twos= (ones and n(of l(*))=dim(l));
   num_ones = ones*n(of l(*));
   num_twos = twos*n(of l(*));
   label
      r = 'Range of Lead'
      m = 'Max of Lead'
      ones = 'All populated lead are 1'
      twos = 'All populated lead are 2'
      all_ones = 'All Lead are 1'
      all_twos = 'All Lead are 2'
      num_ones = 'Number Lead with 1 when only 1'
      num_twos = 'Number Lead with 2 when only 2'
   ;
datalines;
1 1 1 1 1 1 1 1
2 1 1 1 . . . .
. 1 1 1 1 1 2 2
4 1 2 2 2 2 2 2
5 1 1 1 1 2 2 2
6 2 2 2 2 2 2 2
7 2 2 2 2 2 1 2
8 2 1 2 2 2 2 2
9 . 1 1 1 . 2 1
10 2 2 2 1 2 2 .
11 . 99 1 . . . 2
;
run;

Proc freq data = have;
   tables ones twos all_ones all_twos num_ones num_twos;
run;

If you haven't used arrays time to learn. It is one of the basic tools for doing multiple operations with multiple variables. The Dim(arrayname) function returns the number of elements defined for the array.

Note that using the shortcut Lead_: the array uses all variables whose names start with Lead_. If there were 54 lead variables the code would use all of them in the array and none of the code would be need.

As an aside you can use Group = cat(of L(*)); instead of listing all the variables.

If you set Options missing=' '; before running the data step you would not have the . in the group variable if you use CATS instead of CAT function.

The code above uses SAS behavior of returning 1 for a true comparison and 0 for false. So 0 in the Ones variable means not all the values are 1.

rookie21 · Posted 12-05-2021 07:27 AM

For sure, Im not being accurate as it is still not, what I would like. But thank u so much for still trying to help me

I try again from the beginning.

Record_id is equal to a person.

Lead_x is equal to operation. (lead_1 presenting first operation, lead_2 presenting 2nd operation, lead_3 presenting 3rd operation and so on).

Some patients only had 1 operation, and some had multiple.
In lead_x its possible to have 4 different types of operation presented as nr 1, 2 or 3 and 99 is unknown. So in total 4 different types of operation.

For each patient I would like to know, whether they are in the group:
- only 1 in all lead_types

- only 2 in all lead_types

- only 3 in all lead_types

- only_99 in all lead_types

for which all 4 above mentioned is that they only hadd same type of operation during life.

However, a patient can have been operation with 1 for the first operations and afterwards changed to 2, 3, or 99. This goes for all combinations.

So possible groups for each patient consists of:

1 --> 1

2--> 2

3 --> 3

99 --> 99

1 -> 2

1 -> 3

1-->99

2--> 1

2-->3

2 -->99

3 --> 1

3 --> 2

3 --> 99

99 --> 1

99 --> 2

99 --> 3

Lets say patient 1 has following:

lead_1 lead_2 lead_3

1 1 2

Then this patient should be categorized as: "1-->2 group".

So only the change in operation type is of interest for me without how many of each .

Hope this makes more sense ?

PaigeMiller · Posted 12-05-2021 07:31 AM

So only the first change matters when you do this grouping?

If a patient has 1-->2-->3, this patient still is in the 1->2 group?

Try this:

data want;
    set have;
    length group $ 6;
    array lead lead:;
    if range(of lead:)=0 then group=put(lead_1,6. -l);
    else do i=2 to dim(lead);
        if lead(i)^=lead(i-1) then do;
            group=cats(lead(i-1),'->',lead(i));
            leave;
        end;
    end;
    drop i;
run;

--
Paige Miller

rookie21 · Posted 12-05-2021 07:42 AM

No I would actuelly like all changes. 😕

So patient 1 going from 1-3-2-99 is one option

while if patient 2 is having 2-2-2-1 it should be categorized as 2->1 group.

All "changes" in operation Type I would like to track but not if previous type of operation is the same as the present.

rookie21 · Posted 12-05-2021 07:59 AM

Ok THANKS, I have tried this code u wrote:

data want;
    set have;
    length group $ 6;
    array lead lead:;
    if range(of lead:)=0 then group=put(lead_1,6. -l);
    else do i=2 to dim(lead);
        if lead(i)^=lead(i-1) then do;
            group=cats(lead(i-1),'->',lead(i));
            leave;
        end;
    end;
    drop i;
run;

It gives me the right output for the variable changes from first type of operation to the second type.

If I would like to incorporate subsequent changes again, can I do that in same code.

So patient have following:
2 - 2 - 1 - 1 - 2

Can I get the output 2 --> 1 --> 2 here?

PaigeMiller · Posted 12-05-2021 12:21 PM

@rookie21 wrote:

So patient have following:
2 - 2 - 1 - 1 - 2

Can I get the output 2 --> 1 --> 2 here?

Can you please state the whole problem, all the requirements, rather than in each message adding additional requirements that were not previously discussed?

--
Paige Miller

rookie21 · Posted 12-05-2021 01:10 PM

Yea, sorry. Shall be more specific.

Record-id present patient
Lead represent operation. Lead_1 first operation, lead_2 second operation, and so on. Possible variable names for all lead_x is: 1, 2, 3 and 99.
The combination could is what Im looking for.
One patient might have One or multiple operations. But what I look for:
Is every change in operation type

Patient 1 with 1 - 2 - 1 in lead_1, lead_2 and lead_3 respectively, I would like the outcome 1-2-1

For patient 2, if it is 1 - 1 - 1 in lead_1, lead_2 and lead_3 respectively, I would like outcome 1

For patient 3 lets say it is 3 - 99 - 1 with only 3 operation, I would like outcome 3 -> 99 -> 1
And so on…

When I have these . i would use proc freq to count number of patients in each combination

PaigeMiller · Posted 12-05-2021 01:32 PM

So could there be more than two changes to determine the group?

1->2->1->3->2

--
Paige Miller

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!