Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Clustering Glitch (Gower's Distance)

Accepted Solution Solved
Reply
Highlighted
Contributor
Posts: 48
Accepted Solution

Clustering Glitch (Gower's Distance)

Good afternoon! 

 

I am having a glitch on a procedure that I have used before, and I have been unsuccessfully troubleshooting for the past couple of days. I am using SAS EG 7.1. I have provided part of the data set and my code. I am trying to cluster using the Gower distance because I have mixed data (This particular example contains 3 continuous variables). Can anybody help? I only get an error when I use Gower's distance. Using something else (eg euclid) resolves the issue. What am I missing? I am getting the following error message: 

 

NOTE: 20 observation(s) omitted due to missing values.
ERROR: All variables are constant.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TREE may be incomplete. When this step was stopped
there were 0 observations and 37 variables.
WARNING: Data set WORK.TREE was not replaced because this step was stopped.
NOTE: PROCEDURE CLUSTER used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;



PROC DISTANCE Data=sample_data_set METHOD=gower out=distances;
VAR interval(&inputs/std=range);
RUN;



PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15; 
VAR dist:;
RUN;

 


Accepted Solutions
Solution
‎02-27-2018 10:21 AM
SAS Employee
Posts: 231

Re: Clustering Glitch (Gower's Distance)

[ Edited ]
Posted in reply to WarrenKuhfeld

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

View solution in original post


All Replies
SAS Super FREQ
Posts: 502

Re: Clustering Glitch (Gower's Distance)

You are creating a TYPE=SIMILAR data set with PROC DISTANCE.  I am far from an expert on PROC CLUSTER, but I don't think it expects that type of data set.  I see no mention of that type in the documentation. So it appears to be treating it as a raw data set.  

Solution
‎02-27-2018 10:21 AM
SAS Employee
Posts: 231

Re: Clustering Glitch (Gower's Distance)

[ Edited ]
Posted in reply to WarrenKuhfeld

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 2 replies
  • 381 views
  • 1 like
  • 3 in conversation