Re: Calculating combinations in test results

rabbit_hole13 · Posted 07-24-2024 01:57 PM

I am trying to calculate the most common combination of tests combination and test results,

For example I would like to know the count of how many times Test_1 was taken with Test_2 and both positive... There is over 300+ samples with over 100+ tests, and it has been difficult to find a solution/code on how to simply output this. I've tried proc freq to get the individual counts, but that only gets me so far. Any advice?

Sample ID	Test_1	Test_2	Test_3
1	positive	positive	negative
2	positive	positive	negative
3	negative	negative	negative
4	negative	negative	positive

PaigeMiller · Posted 07-24-2024 02:16 PM

Since you don't really say what you will want to do with Test_3, I will just make an assumption that it will be handled as combinations of pos/neg for all three variables. Please confirm that is correct.

This will do the counting you describe

proc freq data=have;
    tables test_1*test_2*test_3/list;
run;

If you have 100 variables, then somehow you need to whittle the process down, and do analyses on a manageable portion of these 100 variables.

--
Paige Miller

ballardw · Posted 07-24-2024 03:20 PM

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

For some types of analysis you may want to consider variables that are numeric 1/0 for Positive/negative.

Then things like the SUM indicate the number of Positive, so if the SUM is the number of variables tested you get "all positive" or if SUM is 0 then all negative.

Example: The first step is just to get data similar to yours, the second is to create numeric versions, then one way to count specific totals.

data have;
   input SampleID $	Test_1 $	Test_2 $	Test_3 $;
datalines;
1	positive	positive	negative
2	positive	positive	negative
3	negative	negative	negative
4	negative	negative	positive
;

data useful;
   set have;
   array t (*) Test_: ;
   array nt (*) nt_1 - nt_3;
   do i=1 to dim(t);
      nt[i] = t[i]='positive';
   end;
   drop i;
run;

proc means data=useful n;
   where sum(nt_1, nt_2)=2;
   var nt_1;
run;

or all the same:

proc freq data=useful;
  where range(nt_1,nt_2,nt_3)=0;
  tables test_1*test_2*test_3 /list;
run;

The various functions, SUM, Max, Min, Range basically can allow different sorts of if to be written simply

SUM = (specific number of positive), Sum > X means at least X positive,

MAX= 1 at least one of the values is positive

Min=0 at least one of the values is negative

Range=0 all the same, Range=1 some mixture of positive and negative

FreelanceReinh · Posted 07-24-2024 04:49 PM

@ballardw wrote:

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

@ballardw: That would be useful. But I think you need to sort the OUT= dataset by descending count to obtain the desired sort order. The printed output of

tables test_1*test_2*test_3 / list;

with ORDER=FREQ in effect is only sorted by descending one-way frequencies of the three variables: by descending frequencies of TEST_1, descending frequencies of TEST_2 (alone, not depending on the levels of TEST_1) and descending frequencies of TEST_3 (unless this has changed after SAS release 9.4M5).

Calculating combinations in test results