BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Greek
Obsidian | Level 7

Good afternoon! 

 

I am having a glitch on a procedure that I have used before, and I have been unsuccessfully troubleshooting for the past couple of days. I am using SAS EG 7.1. I have provided part of the data set and my code. I am trying to cluster using the Gower distance because I have mixed data (This particular example contains 3 continuous variables). Can anybody help? I only get an error when I use Gower's distance. Using something else (eg euclid) resolves the issue. What am I missing? I am getting the following error message: 

 

NOTE: 20 observation(s) omitted due to missing values.
ERROR: All variables are constant.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TREE may be incomplete. When this step was stopped
there were 0 observations and 37 variables.
WARNING: Data set WORK.TREE was not replaced because this step was stopped.
NOTE: PROCEDURE CLUSTER used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;



PROC DISTANCE Data=sample_data_set METHOD=gower out=distances;
VAR interval(&inputs/std=range);
RUN;



PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15; 
VAR dist:;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

View solution in original post

2 REPLIES 2
WarrenKuhfeld
Rhodochrosite | Level 12

You are creating a TYPE=SIMILAR data set with PROC DISTANCE.  I am far from an expert on PROC CLUSTER, but I don't think it expects that type of data set.  I see no mention of that type in the documentation. So it appears to be treating it as a raw data set.  

DougWielenga
SAS Employee

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2625 views
  • 1 like
  • 3 in conversation