Hi, I have been reading through the documentation for Proc Corresp, but I'm still not very confident in my interpretation of the output from this analysis. The Inertia and Chi-Square Decomposition is especially confusing for me. Are the rows in this output associated with the various dimensions? Also, can the correspondence analysis plot be interpreted using the by distance between the elements in the plot? It seems like the SAS documentation has conflicting information on this point. Any other insights/tips about interpreting the default output from this analysis would be greatly appreciated.
As with almost every multivariate procedure, the interpretation is greatly aided by the scatterplots. That is a big advantage of most/all multivariate procedures. So you really ought to be looking at the scatter plot (similar to the one all the way at the bottom here).
Your scatterplot shows some red (which I assume are the columns, but I can't read the names) and one red is noticeably to the left of the others, which line up almost vertically. So there are really two things going on here in the red, a left-right difference (similar to Pacific being different from the other geographies in the example) and an up-down difference. I leave it up to you to interpret and decide if this is useful. Your rows, which I assume are the blue, form a (I'm going use a technical term here) "blob" with "tails". This tells me there are no clearly defined clusters, but a lot of variability in two dimensions, and maybe some outliers at the top right and far left. In addition, there is some correlation between dimension 1 and dimension 2, for example low on dimension 1 is always high on dimension 2, and mid-to-far left on dimension 1 is always middle of dimension 2. As with any multivariate procedure, the next task (which is up to you) is to interpret this in some way, and decide if any of this is actually useful analysis.
There is a technique used in Principal Components and Partial Least Squares which might apply here in helping you understand what these "tails" are, although I don't know if it will work for correspondence analysis. That technique is called Contribution Plots, as described in Paige Miller (that's me), Ronald E. Swanson and Charles E. Heckler, "Contribution plots: a missing link in multivariate quality control", Applied Mathematics and Computer Science, 8, 775-792, 1998. This method is built into SAS PROC MVPDIAGNOSE which runs on Principal Components.
One more issue: I don't know what your rows and columns are. If they are two different "variables", as in the example, then this is appropriate for Correspondence Analysis. If columns are a "variable" and rows are "individuals", then I don't think this is appropriate for Correspondence Analysis but Principal Components is definitely a possible replacement method. With this many blue data points as rows, I suspect the rows are "individuals".
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.