I have a table with two variables:
(1) Account type (3 levels, categorical) across the columns.
(2) For each account type, I have listed the top 5 food products and the top 5 food products are different for each account type (see attached image for an example of my table).
I'm wondering if it's possible to perform any kind of statistical test of association for these variables (e.g., chi square/Fisher's exact)? I assumed not because the food product variable changes depending on the account type variable, but I just wanted to be sure. Or would I have to create a table for each account and its respective food products and then perform a statistical test, such as chi square. variable table
> I assumed not because the food product variable changes depending on the account type variable, but I just wanted to be sure
I think you are correct. The margins for the rows would have to be the same, such as "Fast food", "Alcohol", etc., and the cells would have to contain the proportion of the i_th food item for the j_th column. Then you could test whether the proportions differ among columns.
If you want to pursue this, my advice is to look at the top 5 food items OVERALL (and probably an "other" category so column margins add to 100%). Put those top items in the left margin. You can then test whether the proportion for those categories differ among the account types.
> I assumed not because the food product variable changes depending on the account type variable, but I just wanted to be sure
I think you are correct. The margins for the rows would have to be the same, such as "Fast food", "Alcohol", etc., and the cells would have to contain the proportion of the i_th food item for the j_th column. Then you could test whether the proportions differ among columns.
If you want to pursue this, my advice is to look at the top 5 food items OVERALL (and probably an "other" category so column margins add to 100%). Put those top items in the left margin. You can then test whether the proportion for those categories differ among the account types.
Thank you! That makes sense. A follow up question then: if I did look at the top 5 food products overall, and I wanted to run a fisher's exact in SAS, would I need to list each food product as a separate fisher's test (since each food category is its own column in my dataset).
proc freq data=dataset1;
tables (fast_food candy alcohol energy_drink meals)*account_type;
exact fisher/mc;
run;
It depends on what you want to test.
If you want to test whether the proportions of Top 5 categories differ by account type, you can use a TABLE statement like (FoodCategory * account_type). This analysis is on a two-way 6x3 table (assuming there is an "Other" category). If the test rejects the null hypothesis ("no association"), you know that there is a difference, but you don't know which food(s) are responsible.
If you want to test whether a particular food differs by account type, then you are doing an analysis on a one-way 1x3 table. You can probably don't need to create the extra variables. Depending on the structure of your data, you might be able to use a WHERE clause and/or a BY statement.
> I believe I am trying to test the first option you mentioned, that there is a difference in the proportion of food categories by account type. So just to confirm, I would re run the fisher's exact code for each food category (e.g., fast_food*account_type, then alcohol*account_type, etc.).
No, that is the opposite of what I said. Please re-read my response.
Regarding the p-value, see the article "Monte Carlo simulation for contingency tables in SAS."
The p-value for the chi-square test is the value next to the row that says "Pr >= ChiSq".
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.