turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Why is there a difference between two levels o...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-15-2018 02:07 PM

I get the feeling the following request is a lot simpler than I am making it out to be. Assume I have a variable with a finite number of possible nominal values (A - E, for example). According to a PROC ANOVA, there is a difference between the distribution of this variable at level 1 and at level 2. I would like to determine which, if any, of the values occur with significantly different frequencies across the two levels, and I am just flat out stuck figuring out a simple way to program this or which PROC statements to use to move forward.

Highlighted
## Re: Why is there a difference between two levels of a discrete distribution?

[ Edited ]
Options

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MFLoGrasso

01-15-2018 03:21 PM - edited 01-15-2018 03:24 PM

Apologies if I'm misreading your question, but using ANOVA with a nominal dependent variable is not appropriate. It sounds like you want to see if the distribution of values in one nominal variable differs across levels of another variable. If so, I'd use PROC FREQ and add the CHISQ option to the TABLES statement to get the Pearson Chi-Square test. To investigate how much each cell in the two-way table deviates from its expected value under the null hypothesis of no association between the two variables, you could also add the CELLCHI2 option. This would add an extra number to each cell showing (observed - expected)^2 / expected. Higher values mean greater deviation. The code would look something like this:

```
proc freq data=mydata;
tables var1 * var2 / chisq cellchi2;
run;
```

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to dagremu

01-15-2018 03:59 PM

Point taken. I was doing a distribution analysis for many variables, nominal, ordinal, interval, and ratio, so I just did a massive ANOVA for speed's sake.

That said, can the cell's contribution to the chi-squared value be used in such a way to generate a p-value for its difference from the same cell value in the other level?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to MFLoGrasso

01-19-2018 09:09 AM

I don't know about p-values, but one procedure that analyzes a cell's contribution to the chi-square value is the CORRESP procedure. The doc has an example that shows the deviation from the expected value (under the assumption of ind... In particular, see the table "Contributions to the Total Chi-Square Statistic."