Dear SAS users,
I am working with a high dimensional clinical dataset that consists of about 750 individuals (rows) and 19-20 variables (columns). Each of the variables is a categorical variable that is a binary response (0 vs. 1 indicating absence vs. presence of a given clinical finding). I was exploring use of MCA to look for associations between these variables. My main issue is that when I have the graphical output made, there is no way for me to know which variable is which in the 2-D correspondence analysis plot. There are just clusters of points with 0 or 1, and neither the color, point/shape, or text allows the viewer to see which variables are clustering together.
I noticed that when SAS produces the column coordinates in table form in the output (Dim1 and Dim2) for each response value, the variable names are also not indicated. Does SAS not carry forward the variable names when it generates the output? Is there a way to do so?
Here is the example of my code (SAS 9.4):
Ods graphics on;
Ods html sge=on;
Title 'Multiple Correspondence Analysis Test';
Title2 'All ANY_ variables';
PROC CORRESP Data=work.testA MCA Dimens=2 OutC=OutCA1 OutF=OutFCA1 ALL Plots=All Greenacre;
Tables ANY_CHIN ANY_CRANIUM ANY_EAR ANY_FACE ANY_FOREHEAD ANY_HANDFEET_CREASES ANY_HANDS_FEET ANY_LIPS
ANY_MANDIBLE ANY_MAXILLA_MIDFACE ANY_MOUTH ANY_NAILS ANY_NECK ANY_NOSE ANY_ORAL_CAVITY ANY_PERIORBITAL
ANY_PHILTRUM ANY_SCALP_HAIR;
RUN;
Ods _all_ close;
ods html;
Thank you for the help!
Use PROC FORMAT to created unique formatted values to the levels of each of your variables. Then assign the formats to variables with a FORMAT statement in your PROC CORRESP step. Then the plot should display the formatted values.
Thanks, but the issue is not changing the format of the responses from 0/1 to "Absent/Present" for each of the variables, it is in the actual PROC CORRESP which does not display the variable names either in the tabular output (Dim1, Dim2) and does not display variable names in the graphical output.
So whether or not I change the format (note: the format of the variables are all the same), what I am left with is being unable to easily see the MCA values for each variable or be able to easily distinguish them in the graphical output.
I think this may be an issue with the need to edit the output from the PROC CORRESP, but the SAS guides are unclear if or how this would be possible.
I am going for something like this: https://www.researchgate.net/figure/Biplot-of-the-first-two-axes-of-the-multiple-correspondence-anal...... which will allow me to use colors/shapes and text to adequately label the output for my variables (especially since all of them have the same binary response format).
I appreciate you commenting!
The points displayed on the plot are values of variables, not variables. You indicated that the plot you got has every point labeled as 0 or 1 - that is because each of your variables has the same two values - 0 or 1. If you simply give the 0 and 1 values in each variable a unique label, then those points will be labeled meaningfully and distinctly. Using PROC FORMAT as I described is one easy way to do that. Or, you could simply recode each variable's values using a DATA step. Either way, you need to make the *values* of the variables distinct in some way so that those distinct labels appear in the plot. For example, if you have variable Stakeholder with values 0 or 1, you could assign value 1 a formatted value of "Resident" and value 0 a formatted value of "Visitor". Or create a new variable in a DATA step like: if stakeholder=1 then shnew="Visitor"; and then use the new variable instead of Stakeholder.
Also, to get a distinct color for the values of each variable, use the SOURCE option: Plots(source)=All Greenacre . You could also include the variable name in the labels as described above if you want.
Thank you StatDave_sas, I stand corrected! I realized that I missed your suggestion that the assigned formats should be unique to EACH variable's response. This worked.
My code ended up looking like something like this:
PROC FORMAT;
Value Chinfmt 0='Chin_No' 1='Chin_Yes';
Value Craniumfmt 0='Cranium_No' 1='Cranium_Yes';
Value Earfmt 0='Ear_No' 1='Ear_Yes';
Value Facefmt 0='Face_No' 1='Face_Yes';
Value Foreheadfmt 0='Forehead_No' 1='Forehead_Yes';
Value HandfeetCreasesfmt 0='Creases_No' 1='Creases_Yes';
Value HandsFeetfmt 0='HandsFeet_No' 1='HandsFeet_Yes';
Value Lipsfmt 0='Lips_No' 1='Lips_Yes';
...
...
...
RUN;
I then reran my PROC CORRESP and added a Format statement and included all the variables that I wanted formatted (as specified above). This worked. I am able to more clearly distinguish these in the MCA plot that is generated. I tried the SOURCE option, but I am not getting different colors for the 0 vs. 1 responses. But I will tinker with that more.
Otherwise, do you also know if it is possible to output the Burt matrix generated in the procedure, so that a heat map of the pairwise correlations of the variable can be generated?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.