I am trying to calculate the most common combination of tests combination and test results,
For example I would like to know the count of how many times Test_1 was taken with Test_2 and both positive... There is over 300+ samples with over 100+ tests, and it has been difficult to find a solution/code on how to simply output this. I've tried proc freq to get the individual counts, but that only gets me so far. Any advice?
Sample ID | Test_1 | Test_2 | Test_3 |
1 | positive | positive | negative |
2 | positive | positive | negative |
3 | negative | negative | negative |
4 | negative | negative | positive |
Since you don't really say what you will want to do with Test_3, I will just make an assumption that it will be handled as combinations of pos/neg for all three variables. Please confirm that is correct.
This will do the counting you describe
proc freq data=have;
tables test_1*test_2*test_3/list;
run;
If you have 100 variables, then somehow you need to whittle the process down, and do analyses on a manageable portion of these 100 variables.
A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.
For some types of analysis you may want to consider variables that are numeric 1/0 for Positive/negative.
Then things like the SUM indicate the number of Positive, so if the SUM is the number of variables tested you get "all positive" or if SUM is 0 then all negative.
Example: The first step is just to get data similar to yours, the second is to create numeric versions, then one way to count specific totals.
data have; input SampleID $ Test_1 $ Test_2 $ Test_3 $; datalines; 1 positive positive negative 2 positive positive negative 3 negative negative negative 4 negative negative positive ; data useful; set have; array t (*) Test_: ; array nt (*) nt_1 - nt_3; do i=1 to dim(t); nt[i] = t[i]='positive'; end; drop i; run; proc means data=useful n; where sum(nt_1, nt_2)=2; var nt_1; run;
or all the same:
proc freq data=useful; where range(nt_1,nt_2,nt_3)=0; tables test_1*test_2*test_3 /list; run;
The various functions, SUM, Max, Min, Range basically can allow different sorts of if to be written simply
SUM = (specific number of positive), Sum > X means at least X positive,
MAX= 1 at least one of the values is positive
Min=0 at least one of the values is negative
Range=0 all the same, Range=1 some mixture of positive and negative
@ballardw wrote:
A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.
@ballardw: That would be useful. But I think you need to sort the OUT= dataset by descending count to obtain the desired sort order. The printed output of
tables test_1*test_2*test_3 / list;
with ORDER=FREQ in effect is only sorted by descending one-way frequencies of the three variables: by descending frequencies of TEST_1, descending frequencies of TEST_2 (alone, not depending on the levels of TEST_1) and descending frequencies of TEST_3 (unless this has changed after SAS release 9.4M5).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.