10-10-2011 09:18 AM
Hi, hope someone can help me with this one!
One of the "limitations" of Proc Varclus is that only divides a set of numeric variables into disjoint or hierarchical clusters and from here you can remove redundant variables etc.. However, more often than not, your data set made of not only numerica variables but ordinal, binary etc. Then I thought I could use proc distance to produce a matrix that I could use as input for Varclus but sadly Proc VarClus doesn't accept type=DISTANCE as input data set.
I can produce a data set type=DISTANCE and then convert it manually to type=CORR in order to use it with Proc Varclus but I am not sure about the following:
10-21-2011 09:47 AM
I think there are some problem (statistically speaking) with your approach. You can't have a correlation matrix with zeros on the diagonal; VARCLUS will know that it can't compute with such a nonsensical matrix.
If you have ORDINAL character values (like "small", "medium", and "large"), you can recode the values in various ways. The simplest way to do this is to assign the value j to the j_th ordered category. However, there are other ways as well. You can use PROC FREQ to do this: use the SCORES= option on the TABLES statement and request a SCOREOUT data set http://support.sas.com/documentation/cdl/en/procstat/63963/HTML/default/viewer.htm#procstat_freq_sec...
If you have general nominal data (for example, "red," "green," and "blue") then I don't know how to make sense of your question.