Hi, hope someone can help me with this one!
One of the "limitations" of Proc Varclus is that only divides a set of numeric variables into disjoint or hierarchical clusters and from here you can remove redundant variables etc.. However, more often than not, your data set made of not only numerica variables but ordinal, binary etc. Then I thought I could use proc distance to produce a matrix that I could use as input for Varclus but sadly Proc VarClus doesn't accept type=DISTANCE as input data set.
I can produce a data set type=DISTANCE and then convert it manually to type=CORR in order to use it with Proc Varclus but I am not sure about the following:
Many Thanks,
Alberto
I think there are some problem (statistically speaking) with your approach. You can't have a correlation matrix with zeros on the diagonal; VARCLUS will know that it can't compute with such a nonsensical matrix.
If you have ORDINAL character values (like "small", "medium", and "large"), you can recode the values in various ways. The simplest way to do this is to assign the value j to the j_th ordered category. However, there are other ways as well. You can use PROC FREQ to do this: use the SCORES= option on the TABLES statement and request a SCOREOUT data set http://support.sas.com/documentation/cdl/en/procstat/63963/HTML/default/viewer.htm#procstat_freq_sec...
If you have general nominal data (for example, "red," "green," and "blue") then I don't know how to make sense of your question.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.