turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Studio
- /
- Scoring clusters

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 08:49 AM

Hello, I am currently using SAS university and I keep getting the error

**"ERROR: Missing values are not allowed in the lower triangle of a distance data set."**

** **

I am trying to create a ranking system for a series of roads based on the number of daily cars. Basically I want to cluster the data to find more natural breaking points and get a score of 1-10. I've tried the "missing=" function with no luck. Could this be a SAS univeristy issue?

*FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;*

*PROC IMPORT DATAFILE=REFFILE**DBMS=CSV**OUT=roaddata1;**GETNAMES=YES;**RUN;*

*PROC CONTENTS DATA=roaddata1; RUN;*

*%web_open_table(WORK.IMPORT);*

*proc distance data=roaddata1 out=distmatrix method=Euclid;**var interval (FACT_AADT/ std=std) missing=0;**id SEGMENT_ID;**run;*

*proc cluster data=distmatrix outtree=trees**method=ward noprint;**id SEGMENT_ID;**run;**proc tree data=trees noprint out=Final n=10;**id SEGMENT_ID;*

**THANK YOU!**

** **

(I had to convert the file from .csv to .xlsx for upload. How is .csv not a valid upload for a data community!?)

Accepted Solutions

Solution

02-17-2016
04:03 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 03:44 PM

I don't get any error message when I run your code, after I remove the missing=0 option.

33 34 proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid; 35 var interval (FACT_AADT/ std=std); 36 id SEGMENT_ID; 37 run; NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 10.11 seconds cpu time 3.93 seconds 38 39 proc cluster data=distmatrix outtree=trees 40 method=ward noprint; 41 id SEGMENT_ID; 42 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. NOTE: Input distances have been squared. WARNING: Ties for minimum distance between clusters have been detected at 6852 level(s) in the cluster history. NOTE: The data set WORK.TREES has 15997 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 10.18 seconds cpu time 10.09 seconds 43 44 proc tree data=trees noprint out=Final n=10; 45 id SEGMENT_ID; 46 run; NOTE: The data set WORK.FINAL has 7999 observations and 3 variables. NOTE: PROCEDURE TREE used (Total process time): real time 0.26 seconds cpu time 0.09 seconds

PG

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 02:54 PM

I can see only two modes in your data. Why not take deciles

?

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 03:29 PM

Thank you PG. Deciles will work fine for these data as a backup. I guess I just like clustering and I'm a believer that the numbers are maybe a little bit more meaningful than just decile cuts.

I'm absoultely confused whats wrong with these data to get that error message as there is no missing anything. For future reference for myself, I may wait to see if somebody wants to explain to me whats wrong with the script. Otherwise I may have to figure out how to script this is R for similar scoring projects...

Best

Solution

02-17-2016
04:03 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 03:44 PM

I don't get any error message when I run your code, after I remove the missing=0 option.

33 34 proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid; 35 var interval (FACT_AADT/ std=std); 36 id SEGMENT_ID; 37 run; NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 10.11 seconds cpu time 3.93 seconds 38 39 proc cluster data=distmatrix outtree=trees 40 method=ward noprint; 41 id SEGMENT_ID; 42 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. NOTE: Input distances have been squared. WARNING: Ties for minimum distance between clusters have been detected at 6852 level(s) in the cluster history. NOTE: The data set WORK.TREES has 15997 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 10.18 seconds cpu time 10.09 seconds 43 44 proc tree data=trees noprint out=Final n=10; 45 id SEGMENT_ID; 46 run; NOTE: The data set WORK.FINAL has 7999 observations and 3 variables. NOTE: PROCEDURE TREE used (Total process time): real time 0.26 seconds cpu time 0.09 seconds

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 04:03 PM

Interesting. Did you run that on the desktop version or in sas studio? Using SAS Studio I ran this...

FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR; PROC IMPORT DATAFILE=REFFILE DBMS=CSV OUT=roaddata1; GETNAMES=YES; RUN; PROC CONTENTS DATA=roaddata1; RUN; %web_open_table(WORK.IMPORT); proc distance data=roaddata1 out=distmatrix method=Euclid; var interval (FACT_AADT/ std=std); id SEGMENT_ID; run; proc cluster data=distmatrix outtree=trees method=ward noprint; id SEGMENT_ID; run; proc tree data=trees noprint out=Final n=10; id SEGMENT_ID;

and got this.... (I'm thinking it is a SAS Studio Problem)

NOTE: The data set WORK.DISTMATRIX has 8000 observations and 8001 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 17.06 seconds cpu time 0.80 seconds 91 92 proc cluster data=distmatrix outtree=trees 93 method=ward noprint; 94 id SEGMENT_ID; 95 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. ERROR: Missing values are not allowed in the lower triangle of a distance data set. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.TREES may be incomplete. When this step was stopped there were 0 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 58.22 seconds cpu time 1.48 seconds 96 97 proc tree data=trees noprint out=Final n=10; 98 id SEGMENT_ID; 99 100 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; WARNING: The data set WORK.FINAL may be incomplete. When this step was stopped there were 0 observations and 3 variables. 112

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 04:06 PM

I think you may have an empty line as the end of your csv file that is causing the problem.

PG

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-17-2016 06:33 PM

Thank you! Because of your posts I looked at it again and found that it was not a blank row, but rather I had to specify the type/shape of the matrix. It works now!

proc distance data=roaddata2 out=distmatrix method=Euclid **shape=square**;

var interval (FACT_AADT/ std=std);

id SEGMENT_ID;

run;