BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Brendankjansdkj
Fluorite | Level 6

Hello, I am currently using SAS university and I keep getting the error

"ERROR: Missing values are not allowed in the lower triangle of a distance data set."

 

I am trying to create a ranking system for a series of roads based on the number of daily cars. Basically I want to cluster the data to find more natural breaking points and get a score of 1-10. I've tried the "missing=" function with no luck. Could this be a SAS univeristy issue?

 

FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=roaddata1;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=roaddata1; RUN;


%web_open_table(WORK.IMPORT);

 

proc distance data=roaddata1 out=distmatrix method=Euclid;
var interval (FACT_AADT/ std=std) missing=0;
id SEGMENT_ID;
run;


proc cluster data=distmatrix outtree=trees
method=ward noprint;
id SEGMENT_ID;
run;

proc tree data=trees noprint out=Final n=10;
id SEGMENT_ID;

 

THANK YOU!

 

(I had to convert the file from .csv to .xlsx for upload. How is .csv not a valid upload for a data community!?)

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

I don't get any error message when I run your code, after I remove the missing=0 option.

 

33
34   proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid;
35   var interval (FACT_AADT/ std=std);
36   id SEGMENT_ID;
37   run;

NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           10.11 seconds
      cpu time            3.93 seconds


38
39   proc cluster data=distmatrix outtree=trees
40   method=ward noprint;
41   id SEGMENT_ID;
42   run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the
      procedure requires that the order of the rows match the order of the
      variables.
NOTE: Input distances have been squared.
WARNING: Ties for minimum distance between clusters have been detected at 6852
         level(s) in the cluster history.
NOTE: The data set WORK.TREES has 15997 observations and 11 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           10.18 seconds
      cpu time            10.09 seconds


43
44   proc tree data=trees noprint out=Final n=10;
45   id SEGMENT_ID;
46   run;

NOTE: The data set WORK.FINAL has 7999 observations and 3 variables.
NOTE: PROCEDURE TREE used (Total process time):
      real time           0.26 seconds
      cpu time            0.09 seconds
PG

View solution in original post

6 REPLIES 6
PGStats
Opal | Level 21

I can see only two modes in your data. Why not take deciles

 

?SGPlot.png

PG
Brendankjansdkj
Fluorite | Level 6

Thank you PG. Deciles will work fine for these data as a backup. I guess I just like clustering and I'm a believer that the numbers are maybe a little bit more meaningful than just decile cuts.

 

I'm absoultely confused whats wrong with these data to get that error message as there is no missing anything. For future reference for myself, I may wait to see if somebody wants to explain to me whats wrong with the script. Otherwise I may have to figure out how to script this is R for similar scoring projects...

 

Best

PGStats
Opal | Level 21

I don't get any error message when I run your code, after I remove the missing=0 option.

 

33
34   proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid;
35   var interval (FACT_AADT/ std=std);
36   id SEGMENT_ID;
37   run;

NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           10.11 seconds
      cpu time            3.93 seconds


38
39   proc cluster data=distmatrix outtree=trees
40   method=ward noprint;
41   id SEGMENT_ID;
42   run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the
      procedure requires that the order of the rows match the order of the
      variables.
NOTE: Input distances have been squared.
WARNING: Ties for minimum distance between clusters have been detected at 6852
         level(s) in the cluster history.
NOTE: The data set WORK.TREES has 15997 observations and 11 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           10.18 seconds
      cpu time            10.09 seconds


43
44   proc tree data=trees noprint out=Final n=10;
45   id SEGMENT_ID;
46   run;

NOTE: The data set WORK.FINAL has 7999 observations and 3 variables.
NOTE: PROCEDURE TREE used (Total process time):
      real time           0.26 seconds
      cpu time            0.09 seconds
PG
Brendankjansdkj
Fluorite | Level 6

Interesting. Did you run that on the desktop version or in sas studio? Using SAS Studio I ran this...

FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=roaddata1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=roaddata1; RUN;

%web_open_table(WORK.IMPORT);
 
proc distance data=roaddata1 out=distmatrix method=Euclid;
var interval (FACT_AADT/ std=std);
id SEGMENT_ID;
run;

proc cluster data=distmatrix outtree=trees
method=ward noprint;
id SEGMENT_ID;
run;

proc tree data=trees noprint out=Final n=10;
id SEGMENT_ID;

and got this.... (I'm thinking it is a SAS Studio Problem)

 

 

 
NOTE: The data set WORK.DISTMATRIX has 8000 observations and 8001 variables.
 NOTE: PROCEDURE DISTANCE used (Total process time):
       real time           17.06 seconds
       cpu time            0.80 seconds
       
 
 91         
 92         proc cluster data=distmatrix outtree=trees
 93         method=ward noprint;
 94         id SEGMENT_ID;
 95         run;
 
 NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match 
       the order of the variables.
 ERROR: Missing values are not allowed in the lower triangle of a distance data set.
 NOTE: The SAS System stopped processing this step because of errors.
 WARNING: The data set WORK.TREES may be incomplete.  When this step was stopped there were 0 observations and 11 variables.
 NOTE: PROCEDURE CLUSTER used (Total process time):
       real time           58.22 seconds
       cpu time            1.48 seconds
       
 96         
 
 
 97         proc tree data=trees noprint out=Final n=10;
 98         id SEGMENT_ID;
 99         
 100        OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
 WARNING: The data set WORK.FINAL may be incomplete.  When this step was stopped there were 0 observations and 3 variables.
 112        
 
PGStats
Opal | Level 21

I think you may have an empty line as the end of your csv file that is causing the problem.

PG
Brendankjansdkj
Fluorite | Level 6

 

Thank you! Because of your posts I looked at it again and found that it was not a blank row, but rather I had to specify the type/shape of the matrix. It works now!

 

proc distance data=roaddata2 out=distmatrix method=Euclid shape=square;
var interval (FACT_AADT/ std=std);
id SEGMENT_ID;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2699 views
  • 2 likes
  • 2 in conversation