BookmarkSubscribeRSS Feed
bmhelm
Fluorite | Level 6

Dear SAS users,

 

I am working with a high dimensional clinical dataset that consists of about 750 individuals (rows) and 19-20 variables (columns). Each of the variables is a categorical variable that is a binary response (0 vs. 1 indicating absence vs. presence of a given clinical finding). I was exploring use of MCA to look for associations between these variables. My main issue is that when I have the graphical output made, there is no way for me to know which variable is which in the 2-D correspondence analysis plot. There are just clusters of points with 0 or 1, and neither the color, point/shape, or text allows the viewer to see which variables are clustering together.

 

I noticed that when SAS produces the column coordinates in table form in the output (Dim1 and Dim2) for each response value, the variable names are also not indicated. Does SAS not carry forward the variable names when it generates the output? Is there a way to do so?

 

Here is the example of my code (SAS 9.4):

 

Ods graphics on;
Ods html sge=on;
Title 'Multiple Correspondence Analysis Test';
Title2 'All ANY_ variables';
PROC CORRESP Data=work.testA MCA Dimens=2 OutC=OutCA1 OutF=OutFCA1 ALL Plots=All Greenacre;
Tables ANY_CHIN ANY_CRANIUM ANY_EAR ANY_FACE ANY_FOREHEAD ANY_HANDFEET_CREASES ANY_HANDS_FEET ANY_LIPS
ANY_MANDIBLE ANY_MAXILLA_MIDFACE ANY_MOUTH ANY_NAILS ANY_NECK ANY_NOSE ANY_ORAL_CAVITY ANY_PERIORBITAL
ANY_PHILTRUM ANY_SCALP_HAIR;
RUN;
Ods _all_ close;
ods html;

 

Thank you for the help!

5 REPLIES 5
StatDave
SAS Super FREQ

Use PROC FORMAT to created unique formatted values to the levels of each of your variables. Then assign the formats to variables with a FORMAT statement in your PROC CORRESP step. Then the plot should display the formatted values.

bmhelm
Fluorite | Level 6

Thanks, but the issue is not changing the format of the responses from 0/1 to "Absent/Present" for each of the variables, it is in the actual PROC CORRESP which does not display the variable names either in the tabular output (Dim1, Dim2) and does not display variable names in the graphical output.

 

So whether or not I change the format (note: the format of the variables are all the same), what I am left with is being unable to easily see the MCA values for each variable or be able to easily distinguish them in the graphical output.

 

I think this may be an issue with the need to edit the output from the PROC CORRESP, but the SAS guides are unclear if or how this would be possible.

 

I am going for something like this: https://www.researchgate.net/figure/Biplot-of-the-first-two-axes-of-the-multiple-correspondence-anal......  which will allow me to use colors/shapes and text to adequately label the output for my variables (especially since all of them have the same binary response format). 

 

I appreciate you commenting!

StatDave
SAS Super FREQ

The points displayed on the plot are values of variables, not variables. You indicated that the plot you got has every point labeled as 0 or 1 - that is because each of your variables has the same two values - 0 or 1. If you simply give the 0 and 1 values in each variable a unique label, then those points will be labeled meaningfully and distinctly. Using PROC FORMAT as I described is one easy way to do that. Or, you could simply recode each variable's values using a DATA step. Either way, you need to make the *values* of the variables distinct in some way so that those distinct labels appear in the plot. For example, if you have variable Stakeholder with values 0 or 1, you could assign value 1 a formatted value of "Resident" and value 0 a formatted value of "Visitor". Or create a new variable in a DATA step like: if stakeholder=1 then shnew="Visitor"; and then use the new variable instead of Stakeholder. 

 

Also, to get a distinct color for the values of each variable, use the SOURCE option:   Plots(source)=All Greenacre .  You could also include the variable name in the labels as described above if you want.

 

bmhelm
Fluorite | Level 6

Thank you StatDave_sas, I stand corrected! I realized that I missed your suggestion that the assigned formats should be unique to EACH variable's response. This worked.

 

My code ended up looking like something like this:


PROC FORMAT;
Value Chinfmt 0='Chin_No' 1='Chin_Yes';
Value Craniumfmt 0='Cranium_No' 1='Cranium_Yes';
Value Earfmt 0='Ear_No' 1='Ear_Yes';
Value Facefmt 0='Face_No' 1='Face_Yes';
Value Foreheadfmt 0='Forehead_No' 1='Forehead_Yes';
Value HandfeetCreasesfmt 0='Creases_No' 1='Creases_Yes';
Value HandsFeetfmt 0='HandsFeet_No' 1='HandsFeet_Yes';
Value Lipsfmt 0='Lips_No' 1='Lips_Yes';

...

...

...

RUN;

 

I then reran my PROC CORRESP and added a Format statement and included all the variables that I wanted formatted (as specified above). This worked. I am able to more clearly distinguish these in the MCA plot that is generated. I tried the SOURCE option, but I am not getting different colors for the 0 vs. 1 responses. But I will tinker with that more.

 

Otherwise, do you also know if it is possible to output the Burt matrix generated in the procedure, so that a heat map of the pairwise correlations of the variable can be generated?

 

 

bmhelm
Fluorite | Level 6
Update: I am less familiar with ODS output and select, though I found a resource that allowed me to use ods trace to see the names of the PROC CORRESP output tables/figures. Using ODS, I added the following code: "ods output burt" to select the Burt table and SAS creates a new dataset from that.
It would still be interesting to see if it is possible to create heat maps of the correlations using ods output, but I am less familiar with that. I will continue exploring. Thank you again!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 996 views
  • 0 likes
  • 2 in conversation