BookmarkSubscribeRSS Feed
dglassbrenner
Fluorite | Level 6

What do you look for in the output of Proc Corresp in order to learn which, if any, associations exist between the rows & columns of a contingency table?  Specific instructions illustrated on the Getting Started Example (on doctorates in the sciences) would be appreciated!

11 REPLIES 11
Rick_SAS
SAS Super FREQ

In the GS example, the doc says to look at the Inertia and Chi-Square Decomposition table. "The total chi-square statistic, which is a measure of the association between the rows and columns." Be sure to use the CHI2P option to get the p-value.

 

 

dglassbrenner
Fluorite | Level 6

I probably didn’t explain myself well.  How do you tell which particular rows and columns are associated with one another?

Rick_SAS
SAS Super FREQ

Look at the "Simple Correspondance Analysis of US Population."  The Row Profiles and Row Coordinates tables (and the CA plot) indicate that the "New England" row is similar to "NY,NJ,PA" row,  but is not very similar to the "Pacific" row. Similarly, the "1920" column is similar to the "1930" column, but not very similar to "1970."

dglassbrenner
Fluorite | Level 6

I fear I’m still not explaining my question well.  I’m interested in associations between rows and columns.  For instance, the Getting Started example concludes from Figure 31.1 that Mathematics and Engineering are associated with the earlier years.  How did they come to this conclusion?  It’s not based on the proximity of the 1973 & 1974 points with the Math and Engineering points, right (since the distance between a row point and a column point has no meaning)?

 

More generally, how does one figure out which rows in a contingency table are associated with which columns from the output of Proc Corresp?

Ksharp
Super User

Two points is more close, these two is more association.

E.X.

two points both are falling in northeast corner , so they are positive association.

two points, one is falling in northeast corner,another is falling in southwest,so they are negative association.

StatDave
SAS Super FREQ

As stated by Clausen (1998) when discussing the interpretation of distances between row and column points, "Usually, however, the points i and j will be close to each other when f(ij)>e(ij), and the distance will be great when f(ij)<e(ij), where f(ij) is the observed and e(ij) is the expected frequency... ."  Intuitively, observed counts larger than expected (see the EXPECTED and CELLCHI2 options in PROC CORRESP) indicate some association and this tends to be depicted visually in the plot by closeness.  But no, the distances are not chi-square distances as the are between two row points or two column points. 

 

Claussen, S-E (1998), Applied Correspondence Analysis: An Introduction, Sage University Papers Series on Qualitative Applications in the Social Sciences, 07-121. Thousand Oaks, CA: Sage.

Rick_SAS
SAS Super FREQ

To add to StatDave's response, both examples in the PROC CORRESP doc have a very strong eigendirection, so most of the inertia (the "variance" analog) is in one dimension. Thus although the row and column points are scaled differently, an extreme row point that is near an extreme column point indicates that these quantities differ from the expected values in the same direction.

 

If you are interested in which CELLS in the table are deviating the most from their expected values (under independence), I wouldn't use this CA plot. I'd use PROC FREQ and request either a two-way stacked bar chart or a mosaic plot. You can even color-code the cells of the mosaic plot to represent deviations from expectation, which I think is easier to interpret and gives more information in the two-way case.  I'd reserve CA for higher-dimensional problems.

dglassbrenner
Fluorite | Level 6

Thanks to both StatDave and Rick for their responses.  So if I understand correctly, in order to find which rows are associated with which columns, you look in the Correspondence Analysis plot for row and column points that: 1) are among the furthest from the origin (which is what I assume Rick means by “extreme”), and 2) are close to one another. 

 

I had been wondering why the Getting Started example didn’t conclude that Physical Sciences (instead of Math) and Engineering are associated with the earlier years, given that Phys Sci clearly beats Math on criterion 2).  I think I understand now that it’s because Phys Sci is relatively close to the centroid.

dglassbrenner
Fluorite | Level 6

So I was just looking over the Simple Correspondence Analysis in Example 31.1, and I’m confused again.  It says: “The fact that the Married with Kids point is close to the American point and the fact that the Japanese point is near the Single point should be ignored.”  This would seem to be in direct contradiction to step 2) in my previous post (looking for points that are close to each other).  What gives?

chacreton190
Calcite | Level 5

That line in the SAS docs confused me also. I even wrote to SAS documentation support about it. That line even seems to be contradicted by the next few lines in the passage.

 

When I read the explanations above, they made sense, but I am now nearly completely unclear as to how this output should be interpreted.

Rick_SAS
SAS Super FREQ

As the doc says, " Distances between points within a variable have meaning, but distances between points from different variables do not." That's why the doc says to ignore distances between "Japanese" and "Single." These points belong to different variables.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 2043 views
  • 2 likes
  • 5 in conversation