Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Calculating combinations in test results

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 2 weeks ago
(311 views)

I am trying to calculate the most common combination of tests combination and test results,

For example I would like to know the count of how many times Test_1 was taken with Test_2 and both positive... There is over 300+ samples with over 100+ tests, and it has been difficult to find a solution/code on how to simply output this. I've tried proc freq to get the individual counts, but that only gets me so far. Any advice?

Sample ID | Test_1 | Test_2 | Test_3 |

1 | positive | positive | negative |

2 | positive | positive | negative |

3 | negative | negative | negative |

4 | negative | negative | positive |

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Since you don't really say what you will want to do with Test_3, I will just make an assumption that it will be handled as combinations of pos/neg for all three variables. Please confirm that is correct.

This will do the counting you describe

```
proc freq data=have;
tables test_1*test_2*test_3/list;
run;
```

If you have 100 variables, then somehow you need to whittle the process down, and do analyses on a manageable portion of these 100 variables.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

For some types of analysis you may want to consider variables that are numeric 1/0 for Positive/negative.

Then things like the SUM indicate the number of Positive, so if the SUM is the number of variables tested you get "all positive" or if SUM is 0 then all negative.

Example: The first step is just to get data similar to yours, the second is to create numeric versions, then one way to count specific totals.

data have; input SampleID $ Test_1 $ Test_2 $ Test_3 $; datalines; 1 positive positive negative 2 positive positive negative 3 negative negative negative 4 negative negative positive ; data useful; set have; array t (*) Test_: ; array nt (*) nt_1 - nt_3; do i=1 to dim(t); nt[i] = t[i]='positive'; end; drop i; run; proc means data=useful n; where sum(nt_1, nt_2)=2; var nt_1; run;

or all the same:

proc freq data=useful; where range(nt_1,nt_2,nt_3)=0; tables test_1*test_2*test_3 /list; run;

The various functions, SUM, Max, Min, Range basically can allow different sorts of if to be written simply

SUM = (specific number of positive), Sum > X means at least X positive,

MAX= 1 at least one of the values is positive

Min=0 at least one of the values is negative

Range=0 all the same, Range=1 some mixture of positive and negative

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@ballardw wrote:

A minor addition to @PaigeMiller 's suggestion: add ORDER=FREQ to the proc statement. That will have the most common "combination" appear at the top of the output.

@ballardw: That would be useful. But I think you need to sort the OUT= dataset by descending count to obtain the desired sort order. The printed output of

tables test_1*test_2*test_3 / list;

with ORDER=FREQ in effect is only sorted by descending *one-way* frequencies of the three variables: by descending frequencies of TEST_1, descending frequencies of TEST_2 (alone, not depending on the levels of TEST_1) and descending frequencies of TEST_3 (unless this has changed after SAS release 9.4M5).

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.