Write and run SAS programs in your web browser

Scoring clusters

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Scoring clusters

Hello, I am currently using SAS university and I keep getting the error

"ERROR: Missing values are not allowed in the lower triangle of a distance data set."

 

I am trying to create a ranking system for a series of roads based on the number of daily cars. Basically I want to cluster the data to find more natural breaking points and get a score of 1-10. I've tried the "missing=" function with no luck. Could this be a SAS univeristy issue?

 

FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=roaddata1;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=roaddata1; RUN;


%web_open_table(WORK.IMPORT);

 

proc distance data=roaddata1 out=distmatrix method=Euclid;
var interval (FACT_AADT/ std=std) missing=0;
id SEGMENT_ID;
run;


proc cluster data=distmatrix outtree=trees
method=ward noprint;
id SEGMENT_ID;
run;

proc tree data=trees noprint out=Final n=10;
id SEGMENT_ID;

 

THANK YOU!

 

(I had to convert the file from .csv to .xlsx for upload. How is .csv not a valid upload for a data community!?)


Accepted Solutions
Solution
‎02-17-2016 04:03 PM
Respected Advisor
Posts: 4,608

Re: Scoring clusters

I don't get any error message when I run your code, after I remove the missing=0 option.

 

33
34   proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid;
35   var interval (FACT_AADT/ std=std);
36   id SEGMENT_ID;
37   run;

NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           10.11 seconds
      cpu time            3.93 seconds


38
39   proc cluster data=distmatrix outtree=trees
40   method=ward noprint;
41   id SEGMENT_ID;
42   run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the
      procedure requires that the order of the rows match the order of the
      variables.
NOTE: Input distances have been squared.
WARNING: Ties for minimum distance between clusters have been detected at 6852
         level(s) in the cluster history.
NOTE: The data set WORK.TREES has 15997 observations and 11 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           10.18 seconds
      cpu time            10.09 seconds


43
44   proc tree data=trees noprint out=Final n=10;
45   id SEGMENT_ID;
46   run;

NOTE: The data set WORK.FINAL has 7999 observations and 3 variables.
NOTE: PROCEDURE TREE used (Total process time):
      real time           0.26 seconds
      cpu time            0.09 seconds
PG

View solution in original post


All Replies
Respected Advisor
Posts: 4,608

Re: Scoring clusters

I can see only two modes in your data. Why not take deciles

 

?SGPlot.png

PG
New Contributor
Posts: 4

Re: Scoring clusters

Thank you PG. Deciles will work fine for these data as a backup. I guess I just like clustering and I'm a believer that the numbers are maybe a little bit more meaningful than just decile cuts.

 

I'm absoultely confused whats wrong with these data to get that error message as there is no missing anything. For future reference for myself, I may wait to see if somebody wants to explain to me whats wrong with the script. Otherwise I may have to figure out how to script this is R for similar scoring projects...

 

Best

Solution
‎02-17-2016 04:03 PM
Respected Advisor
Posts: 4,608

Re: Scoring clusters

I don't get any error message when I run your code, after I remove the missing=0 option.

 

33
34   proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid;
35   var interval (FACT_AADT/ std=std);
36   id SEGMENT_ID;
37   run;

NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           10.11 seconds
      cpu time            3.93 seconds


38
39   proc cluster data=distmatrix outtree=trees
40   method=ward noprint;
41   id SEGMENT_ID;
42   run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the
      procedure requires that the order of the rows match the order of the
      variables.
NOTE: Input distances have been squared.
WARNING: Ties for minimum distance between clusters have been detected at 6852
         level(s) in the cluster history.
NOTE: The data set WORK.TREES has 15997 observations and 11 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           10.18 seconds
      cpu time            10.09 seconds


43
44   proc tree data=trees noprint out=Final n=10;
45   id SEGMENT_ID;
46   run;

NOTE: The data set WORK.FINAL has 7999 observations and 3 variables.
NOTE: PROCEDURE TREE used (Total process time):
      real time           0.26 seconds
      cpu time            0.09 seconds
PG
New Contributor
Posts: 4

Re: Scoring clusters

Interesting. Did you run that on the desktop version or in sas studio? Using SAS Studio I ran this...

FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=roaddata1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=roaddata1; RUN;

%web_open_table(WORK.IMPORT);
 
proc distance data=roaddata1 out=distmatrix method=Euclid;
var interval (FACT_AADT/ std=std);
id SEGMENT_ID;
run;

proc cluster data=distmatrix outtree=trees
method=ward noprint;
id SEGMENT_ID;
run;

proc tree data=trees noprint out=Final n=10;
id SEGMENT_ID;

and got this.... (I'm thinking it is a SAS Studio Problem)

 

 

 
NOTE: The data set WORK.DISTMATRIX has 8000 observations and 8001 variables.
 NOTE: PROCEDURE DISTANCE used (Total process time):
       real time           17.06 seconds
       cpu time            0.80 seconds
       
 
 91         
 92         proc cluster data=distmatrix outtree=trees
 93         method=ward noprint;
 94         id SEGMENT_ID;
 95         run;
 
 NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match 
       the order of the variables.
 ERROR: Missing values are not allowed in the lower triangle of a distance data set.
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.TREES may be incomplete.  When this step was stopped there were 0 observations and 11 variables.
 NOTE: PROCEDURE CLUSTER used (Total process time):
       real time           58.22 seconds
       cpu time            1.48 seconds
       
 96         
 
 
 97         proc tree data=trees noprint out=Final n=10;
 98         id SEGMENT_ID;
 99         
 100        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
 112        
 
Respected Advisor
Posts: 4,608

Re: Scoring clusters

I think you may have an empty line as the end of your csv file that is causing the problem.

PG
New Contributor
Posts: 4

Re: Scoring clusters

 

Thank you! Because of your posts I looked at it again and found that it was not a blank row, but rather I had to specify the type/shape of the matrix. It works now!

 

proc distance data=roaddata2 out=distmatrix method=Euclid shape=square;
var interval (FACT_AADT/ std=std);
id SEGMENT_ID;
run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 687 views
  • 2 likes
  • 2 in conversation