interpreting associations between rows & columns in Proc Corresp

Occasional Contributor
Posts: 5

interpreting associations between rows & columns in Proc Corresp

What do you look for in the output of Proc Corresp in order to learn which, if any, associations exist between the rows & columns of a contingency table?  Specific instructions illustrated on the Getting Started Example (on doctorates in the sciences) would be appreciated!

SAS Super FREQ
Posts: 3,752

Re: interpreting associations between rows & columns in Proc Corresp

In the GS example, the doc says to look at the Inertia and Chi-Square Decomposition table. "The total chi-square statistic, which is a measure of the association between the rows and columns." Be sure to use the CHI2P option to get the p-value.

Occasional Contributor
Posts: 5

Re: interpreting associations between rows & columns in Proc Corresp

I probably didn’t explain myself well.  How do you tell which particular rows and columns are associated with one another?

SAS Super FREQ
Posts: 3,752

Re: interpreting associations between rows & columns in Proc Corresp

Look at the "Simple Correspondance Analysis of US Population."  The Row Profiles and Row Coordinates tables (and the CA plot) indicate that the "New England" row is similar to "NY,NJ,PA" row,  but is not very similar to the "Pacific" row. Similarly, the "1920" column is similar to the "1930" column, but not very similar to "1970."

Occasional Contributor
Posts: 5

Re: interpreting associations between rows & columns in Proc Corresp

I fear I’m still not explaining my question well.  I’m interested in associations between rows and columns.  For instance, the Getting Started example concludes from Figure 31.1 that Mathematics and Engineering are associated with the earlier years.  How did they come to this conclusion?  It’s not based on the proximity of the 1973 & 1974 points with the Math and Engineering points, right (since the distance between a row point and a column point has no meaning)?

More generally, how does one figure out which rows in a contingency table are associated with which columns from the output of Proc Corresp?

Super User
Posts: 10,020

Re: interpreting associations between rows & columns in Proc Corresp

Two points is more close, these two is more association.

E.X.

two points both are falling in northeast corner , so they are positive association.

two points, one is falling in northeast corner,another is falling in southwest,so they are negative association.

SAS Employee
Posts: 281

Re: interpreting associations between rows & columns in Proc Corresp

[ Edited ]

As stated by Clausen (1998) when discussing the interpretation of distances between row and column points, "Usually, however, the points i and j will be close to each other when f(ij)>e(ij), and the distance will be great when f(ij)<e(ij), where f(ij) is the observed and e(ij) is the expected frequency... ."  Intuitively, observed counts larger than expected (see the EXPECTED and CELLCHI2 options in PROC CORRESP) indicate some association and this tends to be depicted visually in the plot by closeness.  But no, the distances are not chi-square distances as the are between two row points or two column points.

Claussen, S-E (1998), Applied Correspondence Analysis: An Introduction, Sage University Papers Series on Qualitative Applications in the Social Sciences, 07-121. Thousand Oaks, CA: Sage.

SAS Super FREQ
Posts: 3,752

Re: interpreting associations between rows & columns in Proc Corresp

To add to StatDave's response, both examples in the PROC CORRESP doc have a very strong eigendirection, so most of the inertia (the "variance" analog) is in one dimension. Thus although the row and column points are scaled differently, an extreme row point that is near an extreme column point indicates that these quantities differ from the expected values in the same direction.

If you are interested in which CELLS in the table are deviating the most from their expected values (under independence), I wouldn't use this CA plot. I'd use PROC FREQ and request either a two-way stacked bar chart or a mosaic plot. You can even color-code the cells of the mosaic plot to represent deviations from expectation, which I think is easier to interpret and gives more information in the two-way case.  I'd reserve CA for higher-dimensional problems.

Occasional Contributor
Posts: 5

Re: interpreting associations between rows & columns in Proc Corresp

Thanks to both StatDave and Rick for their responses.  So if I understand correctly, in order to find which rows are associated with which columns, you look in the Correspondence Analysis plot for row and column points that: 1) are among the furthest from the origin (which is what I assume Rick means by “extreme”), and 2) are close to one another.

I had been wondering why the Getting Started example didn’t conclude that Physical Sciences (instead of Math) and Engineering are associated with the earlier years, given that Phys Sci clearly beats Math on criterion 2).  I think I understand now that it’s because Phys Sci is relatively close to the centroid.

Occasional Contributor
Posts: 5