BookmarkSubscribeRSS Feed
rabbit_hole13
Calcite | Level 5

I am trying to calculate the most common combination of tests combination and test results,

For example I would like to know the count of how many times Test_1 was taken with Test_2 and both positive... There is over 300+ samples with over 100+ tests, and it has been difficult to find a solution/code on how to simply output this. I've tried proc freq to get the individual counts, but that only gets me so far. Any advice?

 

 

 

Sample IDTest_1Test_2Test_3
1positivepositive

negative

2

positivepositivenegative
3negativenegativenegative
4negativenegativepositive
    

 

3 REPLIES 3
PaigeMiller
Diamond | Level 26

Since you don't really say what you will want to do with Test_3, I will just make an assumption that it will be handled as combinations of pos/neg for all three variables. Please confirm that is correct.

 

This will do the counting you describe

 

proc freq data=have;
    tables test_1*test_2*test_3/list;
run;

 

If you have 100 variables, then somehow you need to whittle the process down, and do analyses on a manageable portion of these 100 variables.

--
Paige Miller
ballardw
Super User

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

 

For some types of analysis you may want to consider variables that are numeric 1/0 for Positive/negative.

Then things like the SUM indicate the number of Positive, so if the SUM is the number of variables tested you get "all positive" or if SUM is 0 then all negative.

 

Example: The first step is just to get data similar to yours, the second is to create numeric versions, then one way to count specific totals.

data have;
   input SampleID $	Test_1 $	Test_2 $	Test_3 $;
datalines;
1	positive	positive	negative
2	positive	positive	negative
3	negative	negative	negative
4	negative	negative	positive
;

data useful;
   set have;
   array t (*) Test_: ;
   array nt (*) nt_1 - nt_3;
   do i=1 to dim(t);
      nt[i] = t[i]='positive';
   end;
   drop i;
run;

proc means data=useful n;
   where sum(nt_1, nt_2)=2;
   var nt_1;
run;

 

or all the same:

proc freq data=useful;
  where range(nt_1,nt_2,nt_3)=0;
  tables test_1*test_2*test_3 /list;
run;

The various functions, SUM, Max, Min, Range basically can allow different sorts of if to be written simply

SUM = (specific number of positive), Sum > X  means at least X positive,

MAX= 1 at least one of the values is positive

Min=0 at least one of the values is negative

Range=0 all the same, Range=1 some mixture of positive and negative

 

 

FreelanceReinh
Jade | Level 19

@ballardw wrote:

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.


@ballardw: That would be useful. But I think you need to sort the OUT= dataset by descending count to obtain the desired sort order. The printed output of

tables test_1*test_2*test_3 / list;

with ORDER=FREQ in effect is only sorted by descending one-way frequencies of the three variables: by descending frequencies of TEST_1, descending frequencies of TEST_2 (alone, not depending on the levels of TEST_1) and descending frequencies of TEST_3 (unless this has changed after SAS release 9.4M5).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 505 views
  • 0 likes
  • 4 in conversation