Dear SAS Community,
I was looking for a way of visualizing the association between different categorical variables (nominal and ordinal variables). Since a heatmap will probably not work for categorical variables I think that a correspondence analysis should be a good option. Is it ok to include nominal (more than two levels) and also ordinal variables together in a correspondence analysis?
I would greatly appreciate your feedback!
OK. Here you go.
data pop;
infile cards truncover;
input Row & $40. @;
do col = '1','2','3','4','5','6','7','8','9';
input count @;
output;
end;
datalines;
Asian 0 0 0 1 4 3 6 4 0
African American 0 1 7 16 8 37 57 44 8
Caucasian 0 1 4 13 6 24 51 38 9
Hispanic 0 0 0 0 0 0 0 0 0
Native American 5 1 12 24 24 82 108 90 12
Middle Eastern 0 1 2 10 15 24 31 27 6
Pacific Islander 0 0 1 0 0 3 0 0 0
;
proc freq data=pop; /* Row and column marginals */
weight count/zero;
tables row / noprint out=f2(drop=percent rename=(count=RowF row=mRow));
tables col / noprint out=f3(drop=percent rename=(count=ColF col=mCol));
run;
data all1;
if 0 then merge pop f2 f3;
low = 0;
x0 = 35; * 35 is the x axis value of margin freq;
if _n_ = 1 then do;
row = ' '; col = ' '; mrow = ' '; mcol = ' ';
count = 0; rowf = 0; colf = 0; output;
count = 358; rowf = 358; colf = 358; output; * 358 is the max count in margin freq;
end;
merge pop f2 f3;
output;
run;
ods graphics on / height=4.9in width=6.2in;
title;
proc sgplot data=all1 noautolegend noborder ;
heatmapparm y=row x=col colorresponse=count /colormodel=(white cx6767bb cxbb67bb cxdd2255);
text y=row x=col text=count/ textattrs=(size=8) strip contributeoffsets=none;
highlow y=mrow low=low high=rowf / x2axis
type=bar barwidth=0.95 nooutline colormodel=(white cx6767bb cxbb67bb cxdd2255) colorresponse=rowf;
text y=mrow x=x0 text=rowf / x2axis textattrs=(size=8) strip contributeoffsets=none;
highlow x=mcol low=low high=colf / y2axis
type=bar barwidth=0.95 nooutline colormodel=(white cx6767bb cxbb67bb cxdd2255) colorresponse=colf;
text x=mcol y=x0 text=colf / y2axis textattrs=(size=8) strip contributeoffsets=none;
xaxis display=(nolabel noticks noline) offsetmax=.3;
yaxis display=(nolabel noticks noline) offsetmin=.3 reverse;
x2axis display=none offsetmin=.75 offsetmax=.03;
y2axis display=none offsetmin=.75 offsetmax=.03;
run;
Yes. You could could . since PROC CORRESP is just decomposing the chi-square value of contingency table, no matter this category variable is nominal or ordinal .
proc corresp data=sashelp.heart all chi2p;
tables bp_status,weight_status;
run;
And I also think you could use heatmap to visualize it. Here is an example:
That's great, thank you so much Ksharp!!
That heatmap looks great, I really want to try it with my data but I was having problems when inputting my data in the first data step. One of my variables is ' Ethnicity' (7 levels) and the other one is 'overall1' which is a score that goes from 1 to 9 (1 2 3 4 5 6 7 8 9). I put it like this in Col but it didn't work out. I highlighted the statement "do" and "count" because I'm not sure what to include in them. Count should be the numbers I have after each ethnicity level which is the freq of each score for every level of ethnicity. I hope it makes sense. I would greatly appreciate if you could help me.
data one(drop=f: i);
input Row $ 1-7 f1-f9;
array f[9];
do i = 0 to 5;
Col = 1 2 3 4 5 6 7 8 9;
Count = i;
output;
end;
datalines;
Asian 0 0 0 1 4 3 6 4 0
African American 0 1 7 16 8 37 57 44 8
Caucasian 0 1 4 13 6 24 51 38 9
Hispanic 0 0 0 0 0 0 0 0 0
Native American 5 1 12 24 24 82 108 90 12
Middle Eastern 0 1 2 10 15 24 31 27 6
Pacific Islander 0 0 1 0 0 3 0 0 0
;
No time at this moment to work it out for you ... but maybe the author of that blog can help you out quicker.
@WarrenKuhfeld , can you?
See also ...
Thank you very much for the link to that article Koen!
Not sure what that data step is trying to do. But with those datalines (with an extra space inserted after the Ethnicity value so the lines can be parsed properly) are trivial to read in.
data have;
input ethnicity &:$16. @;
do overall=1 to 9;
input count @;
output;
end;
datalines;
Asian 0 0 0 1 4 3 6 4 0
African American 0 1 7 16 8 37 57 44 8
Caucasian 0 1 4 13 6 24 51 38 9
Hispanic 0 0 0 0 0 0 0 0 0
Native American 5 1 12 24 24 82 108 90 12
Middle Eastern 0 1 2 10 15 24 31 27 6
Pacific Islander 0 0 1 0 0 3 0 0 0
;
proc corresp data=have all chi2p;
tables ethnicity,overall;
weight count;
run;
That was also very helpful, thank you Tom!
OK. Here you go.
data pop;
infile cards truncover;
input Row & $40. @;
do col = '1','2','3','4','5','6','7','8','9';
input count @;
output;
end;
datalines;
Asian 0 0 0 1 4 3 6 4 0
African American 0 1 7 16 8 37 57 44 8
Caucasian 0 1 4 13 6 24 51 38 9
Hispanic 0 0 0 0 0 0 0 0 0
Native American 5 1 12 24 24 82 108 90 12
Middle Eastern 0 1 2 10 15 24 31 27 6
Pacific Islander 0 0 1 0 0 3 0 0 0
;
proc freq data=pop; /* Row and column marginals */
weight count/zero;
tables row / noprint out=f2(drop=percent rename=(count=RowF row=mRow));
tables col / noprint out=f3(drop=percent rename=(count=ColF col=mCol));
run;
data all1;
if 0 then merge pop f2 f3;
low = 0;
x0 = 35; * 35 is the x axis value of margin freq;
if _n_ = 1 then do;
row = ' '; col = ' '; mrow = ' '; mcol = ' ';
count = 0; rowf = 0; colf = 0; output;
count = 358; rowf = 358; colf = 358; output; * 358 is the max count in margin freq;
end;
merge pop f2 f3;
output;
run;
ods graphics on / height=4.9in width=6.2in;
title;
proc sgplot data=all1 noautolegend noborder ;
heatmapparm y=row x=col colorresponse=count /colormodel=(white cx6767bb cxbb67bb cxdd2255);
text y=row x=col text=count/ textattrs=(size=8) strip contributeoffsets=none;
highlow y=mrow low=low high=rowf / x2axis
type=bar barwidth=0.95 nooutline colormodel=(white cx6767bb cxbb67bb cxdd2255) colorresponse=rowf;
text y=mrow x=x0 text=rowf / x2axis textattrs=(size=8) strip contributeoffsets=none;
highlow x=mcol low=low high=colf / y2axis
type=bar barwidth=0.95 nooutline colormodel=(white cx6767bb cxbb67bb cxdd2255) colorresponse=colf;
text x=mcol y=x0 text=colf / y2axis textattrs=(size=8) strip contributeoffsets=none;
xaxis display=(nolabel noticks noline) offsetmax=.3;
yaxis display=(nolabel noticks noline) offsetmin=.3 reverse;
x2axis display=none offsetmin=.75 offsetmax=.03;
y2axis display=none offsetmin=.75 offsetmax=.03;
run;
Fantastic, thank you so much Ksharp!! Just one more question if you don't mind:
what does the number 40 stands for in the input row statement?
input Row & $40. @;
Thanks
Oh ok, thank you so much, you were so helpful!
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and save with the early bird rate—just $795!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.