Good afternoon!
I am having a glitch on a procedure that I have used before, and I have been unsuccessfully troubleshooting for the past couple of days. I am using SAS EG 7.1. I have provided part of the data set and my code. I am trying to cluster using the Gower distance because I have mixed data (This particular example contains 3 continuous variables). Can anybody help? I only get an error when I use Gower's distance. Using something else (eg euclid) resolves the issue. What am I missing? I am getting the following error message:
NOTE: 20 observation(s) omitted due to missing values.
ERROR: All variables are constant.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TREE may be incomplete. When this step was stopped
there were 0 observations and 37 variables.
WARNING: Data set WORK.TREE was not replaced because this step was stopped.
NOTE: PROCEDURE CLUSTER used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;
%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;
PROC DISTANCE Data=sample_data_set METHOD=gower out=distances;
VAR interval(&inputs/std=range);
RUN;
PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;
As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER. If you specify DGOWER, it produces a TYPE=DISTANCE matrix.
Table 36.2: Methods That Accept All Measurement Levels
Method |
Description |
Range |
TYPE= |
---|---|---|---|
GOWER |
Gower and Legendre (1986) similarity |
0 to 1 |
SIMILAR |
DGOWER |
1 minus GOWER |
0 to 1 |
DISTANCE |
/*** TRY THIS CODE ***/
data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;
%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;
PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;
PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;
You are creating a TYPE=SIMILAR data set with PROC DISTANCE. I am far from an expert on PROC CLUSTER, but I don't think it expects that type of data set. I see no mention of that type in the documentation. So it appears to be treating it as a raw data set.
As @WarrenKuhfeld mentioned, you are creating a TYPE=SIMILAR matrix when you request GOWER. If you specify DGOWER, it produces a TYPE=DISTANCE matrix.
Table 36.2: Methods That Accept All Measurement Levels
Method |
Description |
Range |
TYPE= |
---|---|---|---|
GOWER |
Gower and Legendre (1986) similarity |
0 to 1 |
SIMILAR |
DGOWER |
1 minus GOWER |
0 to 1 |
DISTANCE |
/*** TRY THIS CODE ***/
data sample_data_set;
input ln_Reviewed_Total_count TMC_change cust_age_num @@;
datalines;
44 0 0.955151741
29 1.386294361 0.973456357
36 1.609437912 0.811390135
21 0.693147181 0.18526393
30 0 1
36 1.098612289 0.403113033
52 0 1
62 1.098612289 0.402174032
61 1.609437912 0.249291364
31 1.386294361 0.475851967
26 0 0.5
30 2.079441542 1
27 2.48490665 0.89780873
25 2.197224577 0.962191474
21 1.386294361 0.394982113
26 0 0.913725826
37 0 0.676080022
50 0 0.34542518
35 0 0.295087229
59 0 0.927908197
48 0 0.667473752
run;
%let inputs=ln_Reviewed_Total_count TMC_change cust_age_num;
PROC DISTANCE Data=sample_data_set METHOD=dgower out=distances;
VAR interval(&inputs/std=range);
RUN;
PROC CLUSTER Data=distances METHOD=average OUTTREE=tree PSEUDO print=15;
VAR dist:;
RUN;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.