BookmarkSubscribeRSS Feed
_Altons_
Calcite | Level 5

Hi, hope someone can help me with this one!

One of the "limitations" of Proc Varclus is that only divides a set of numeric variables into disjoint or hierarchical clusters and from here you can remove redundant variables etc.. However, more often than not, your data set made of not only numerica variables but ordinal, binary etc. Then I thought I could use proc distance to produce a matrix that I could use as input for Varclus but sadly Proc VarClus doesn't accept type=DISTANCE as input data set.

I can produce a data set type=DISTANCE and then convert it manually to type=CORR in order to use it with Proc Varclus but I am not sure about the following:

  • How Varclus will interpret this data set where I have zeros in the diagonal instead of ones? or
  • type=CORR only tells VARCLUS how to read data and there is no calculation where the correlation is involved?
  • perhaps there is a different approach to solve this problem using diff procedures?
  • does this idea make sense at all (statistically speaking)?

Many Thanks,

Alberto

1 REPLY 1
Rick_SAS
SAS Super FREQ

I think there are some problem (statistically speaking) with your approach. You can't have a correlation matrix with zeros on the diagonal; VARCLUS will know that it can't compute with such a nonsensical matrix.

If you have ORDINAL character values (like "small", "medium", and "large"), you can recode the values in various ways. The simplest way to do this is to assign the value j to the j_th ordered category. However, there are other ways as well. You can use PROC FREQ to do this: use the SCORES= option on the TABLES statement and request a SCOREOUT data set http://support.sas.com/documentation/cdl/en/procstat/63963/HTML/default/viewer.htm#procstat_freq_sec...

If you have general nominal data (for example, "red," "green," and "blue") then I don't know how to make sense of your question.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1589 views
  • 0 likes
  • 2 in conversation