BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Greek
Obsidian | Level 7

Good afternoon! 

 

I am having a glitch on a procedure that I have used before, and I have been unsuccessfully troubleshooting for the past couple of days. I am using SAS EG 7.1. I have provided part of the data set and my code. I am trying to cluster using the Gower distance because I have mixed data (This particular example contains 3 continuous variables). Can anybody help? I only get an error when I use Gower's distance. Using something else (eg euclid) resolves the issue. What am I missing? I am getting the following error message: 

 

NOTE: 20 observation(s) omitted due to missing values.
ERROR: All variables are constant.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TREE may be incomplete. When this step was stopped
there were 0 observations and 37 variables.
WARNING: Data set WORK.TREE was not replaced because this step was stopped.
NOTE: PROCEDURE CLUSTER used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds

 


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;



PROC DISTANCE Data=sample_data_set METHOD=gower out=distances;
VAR interval(&inputs/std=range);
RUN;



PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15; 
VAR dist:;
RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

View solution in original post

2 REPLIES 2
WarrenKuhfeld
Ammonite | Level 13

You are creating a TYPE=SIMILAR data set with PROC DISTANCE.  I am far from an expert on PROC CLUSTER, but I don't think it expects that type of data set.  I see no mention of that type in the documentation. So it appears to be treating it as a raw data set.  

DougWielenga
SAS Employee

As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER.   If you specify DGOWER, it produces a TYPE=DISTANCE matrix.  

 

 

Table 36.2: Methods That Accept All Measurement Levels

Method

Description

Range

TYPE=

GOWER

Gower and Legendre (1986) similarity

0 to 1

SIMILAR

DGOWER

1 minus GOWER

0 to 1

DISTANCE

 

/*** TRY THIS CODE ***/


data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;


%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;

 

PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;

 

PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2761 views
  • 1 like
  • 3 in conversation