Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- interpreting associations between rows & columns in Proc Corresp

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-06-2017 09:19 AM
(2042 views)

What do you look for in the output of Proc Corresp in order to learn which, if any, associations exist between the rows & columns of a contingency table? Specific instructions illustrated on the Getting Started Example (on doctorates in the sciences) would be appreciated!

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I fear I’m still not explaining my question well. I’m interested in associations between rows and columns. For instance, the Getting Started example concludes from Figure 31.1 that Mathematics and Engineering are associated with the earlier years. How did they come to this conclusion? It’s not based on the proximity of the 1973 & 1974 points with the Math and Engineering points, right (since the distance between a row point and a column point has no meaning)?

More generally, how does one figure out which rows in a contingency table are associated with which columns from the output of Proc Corresp?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Two points is more close, these two is more association.

E.X.

two points both are falling in northeast corner , so they are positive association.

two points, one is falling in northeast corner,another is falling in southwest,so they are negative association.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As stated by Clausen (1998) when discussing the interpretation of distances between row and column points, "*Usually, however, the points i and j will be close to each other when f(ij)>e(ij), and the distance will be great when f(ij)<e(ij), where f(ij) is the observed and e(ij) is the expected frequency... ." * Intuitively, observed counts larger than expected (see the EXPECTED and CELLCHI2 options in PROC CORRESP) indicate some association and this tends to be depicted visually in the plot by closeness. But no, the distances are not chi-square distances as the are between two row points or two column points.

Claussen, S-E (1998), *Applied Correspondence Analysis: An Introduction*, Sage University Papers Series on Qualitative Applications in the Social Sciences, 07-121. Thousand Oaks, CA: Sage.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

To add to StatDave's response, both examples in the PROC CORRESP doc have a very strong eigendirection, so most of the inertia (the "variance" analog) is in one dimension. Thus although the row and column points are scaled differently, an extreme row point that is near an extreme column point indicates that these quantities differ from the expected values in the same direction.

If you are interested in which CELLS in the table are deviating the most from their expected values (under independence), I wouldn't use this CA plot. I'd use PROC FREQ and request either a two-way stacked bar chart or a mosaic plot. You can even color-code the cells of the mosaic plot to represent deviations from expectation, which I think is easier to interpret and gives more information in the two-way case. I'd reserve CA for higher-dimensional problems.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thanks to both StatDave and Rick for their responses. So if I understand correctly, in order to find which rows are associated with which columns, you look in the Correspondence Analysis plot for row and column points that: 1) are among the furthest from the origin (which is what I assume Rick means by “extreme”), and 2) are close to one another.

I had been wondering why the Getting Started example didn’t conclude that Physical Sciences (instead of Math) and Engineering are associated with the earlier years, given that Phys Sci clearly beats Math on criterion 2). I think I understand now that it’s because Phys Sci is relatively close to the centroid.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

That line in the SAS docs confused me also. I even wrote to SAS documentation support about it. That line even seems to be contradicted by the next few lines in the passage.

When I read the explanations above, they made sense, but I am now nearly completely unclear as to how this output should be interpreted.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 25. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.