Calcite | Level 5

## Calculating combinations in test results

I am trying to calculate the most common combination of tests combination and test results,

For example I would like to know the count of how many times Test_1 was taken with Test_2 and both positive... There is over 300+ samples with over 100+ tests, and it has been difficult to find a solution/code on how to simply output this. I've tried proc freq to get the individual counts, but that only gets me so far. Any advice?

 Sample ID Test_1 Test_2 Test_3 1 positive positive negative 2 positive positive negative 3 negative negative negative 4 negative negative positive

3 REPLIES 3
Diamond | Level 26

## Re: Calculating combinations in test results

Since you don't really say what you will want to do with Test_3, I will just make an assumption that it will be handled as combinations of pos/neg for all three variables. Please confirm that is correct.

This will do the counting you describe

``````proc freq data=have;
tables test_1*test_2*test_3/list;
run;``````

If you have 100 variables, then somehow you need to whittle the process down, and do analyses on a manageable portion of these 100 variables.

--
Paige Miller
Super User

## Re: Calculating combinations in test results

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

For some types of analysis you may want to consider variables that are numeric 1/0 for Positive/negative.

Then things like the SUM indicate the number of Positive, so if the SUM is the number of variables tested you get "all positive" or if SUM is 0 then all negative.

Example: The first step is just to get data similar to yours, the second is to create numeric versions, then one way to count specific totals.

```data have;
input SampleID \$	Test_1 \$	Test_2 \$	Test_3 \$;
datalines;
1	positive	positive	negative
2	positive	positive	negative
3	negative	negative	negative
4	negative	negative	positive
;

data useful;
set have;
array t (*) Test_: ;
array nt (*) nt_1 - nt_3;
do i=1 to dim(t);
nt[i] = t[i]='positive';
end;
drop i;
run;

proc means data=useful n;
where sum(nt_1, nt_2)=2;
var nt_1;
run;```

or all the same:

```proc freq data=useful;
where range(nt_1,nt_2,nt_3)=0;
tables test_1*test_2*test_3 /list;
run;
```

The various functions, SUM, Max, Min, Range basically can allow different sorts of if to be written simply

SUM = (specific number of positive), Sum > X  means at least X positive,

MAX= 1 at least one of the values is positive

Min=0 at least one of the values is negative

Range=0 all the same, Range=1 some mixture of positive and negative

## Re: Calculating combinations in test results

@ballardw wrote:

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

@ballardw: That would be useful. But I think you need to sort the OUT= dataset by descending count to obtain the desired sort order. The printed output of

`tables test_1*test_2*test_3 / list;`

with ORDER=FREQ in effect is only sorted by descending one-way frequencies of the three variables: by descending frequencies of TEST_1, descending frequencies of TEST_2 (alone, not depending on the levels of TEST_1) and descending frequencies of TEST_3 (unless this has changed after SAS release 9.4M5).

Discussion stats
• 3 replies
• 312 views
• 0 likes
• 4 in conversation