Hello, I am currently using SAS university and I keep getting the error
"ERROR: Missing values are not allowed in the lower triangle of a distance data set."
I am trying to create a ranking system for a series of roads based on the number of daily cars. Basically I want to cluster the data to find more natural breaking points and get a score of 1-10. I've tried the "missing=" function with no luck. Could this be a SAS univeristy issue?
FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR;
PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
OUT=roaddata1;
GETNAMES=YES;
RUN;
PROC CONTENTS DATA=roaddata1; RUN;
%web_open_table(WORK.IMPORT);
proc distance data=roaddata1 out=distmatrix method=Euclid;
var interval (FACT_AADT/ std=std) missing=0;
id SEGMENT_ID;
run;
proc cluster data=distmatrix outtree=trees
method=ward noprint;
id SEGMENT_ID;
run;
proc tree data=trees noprint out=Final n=10;
id SEGMENT_ID;
THANK YOU!
(I had to convert the file from .csv to .xlsx for upload. How is .csv not a valid upload for a data community!?)
I don't get any error message when I run your code, after I remove the missing=0 option.
33 34 proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid; 35 var interval (FACT_AADT/ std=std); 36 id SEGMENT_ID; 37 run; NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 10.11 seconds cpu time 3.93 seconds 38 39 proc cluster data=distmatrix outtree=trees 40 method=ward noprint; 41 id SEGMENT_ID; 42 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. NOTE: Input distances have been squared. WARNING: Ties for minimum distance between clusters have been detected at 6852 level(s) in the cluster history. NOTE: The data set WORK.TREES has 15997 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 10.18 seconds cpu time 10.09 seconds 43 44 proc tree data=trees noprint out=Final n=10; 45 id SEGMENT_ID; 46 run; NOTE: The data set WORK.FINAL has 7999 observations and 3 variables. NOTE: PROCEDURE TREE used (Total process time): real time 0.26 seconds cpu time 0.09 seconds
I can see only two modes in your data. Why not take deciles
?
Thank you PG. Deciles will work fine for these data as a backup. I guess I just like clustering and I'm a believer that the numbers are maybe a little bit more meaningful than just decile cuts.
I'm absoultely confused whats wrong with these data to get that error message as there is no missing anything. For future reference for myself, I may wait to see if somebody wants to explain to me whats wrong with the script. Otherwise I may have to figure out how to script this is R for similar scoring projects...
Best
I don't get any error message when I run your code, after I remove the missing=0 option.
33 34 proc distance data=xl.'roaddata1$'n out=distmatrix method=Euclid; 35 var interval (FACT_AADT/ std=std); 36 id SEGMENT_ID; 37 run; NOTE: The data set WORK.DISTMATRIX has 7999 observations and 8000 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 10.11 seconds cpu time 3.93 seconds 38 39 proc cluster data=distmatrix outtree=trees 40 method=ward noprint; 41 id SEGMENT_ID; 42 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. NOTE: Input distances have been squared. WARNING: Ties for minimum distance between clusters have been detected at 6852 level(s) in the cluster history. NOTE: The data set WORK.TREES has 15997 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 10.18 seconds cpu time 10.09 seconds 43 44 proc tree data=trees noprint out=Final n=10; 45 id SEGMENT_ID; 46 run; NOTE: The data set WORK.FINAL has 7999 observations and 3 variables. NOTE: PROCEDURE TREE used (Total process time): real time 0.26 seconds cpu time 0.09 seconds
Interesting. Did you run that on the desktop version or in sas studio? Using SAS Studio I ran this...
FILENAME REFFILE "/folders/myfolders/roaddata1.csv" TERMSTR=CR; PROC IMPORT DATAFILE=REFFILE DBMS=CSV OUT=roaddata1; GETNAMES=YES; RUN; PROC CONTENTS DATA=roaddata1; RUN; %web_open_table(WORK.IMPORT); proc distance data=roaddata1 out=distmatrix method=Euclid; var interval (FACT_AADT/ std=std); id SEGMENT_ID; run; proc cluster data=distmatrix outtree=trees method=ward noprint; id SEGMENT_ID; run; proc tree data=trees noprint out=Final n=10; id SEGMENT_ID;
and got this.... (I'm thinking it is a SAS Studio Problem)
NOTE: The data set WORK.DISTMATRIX has 8000 observations and 8001 variables. NOTE: PROCEDURE DISTANCE used (Total process time): real time 17.06 seconds cpu time 0.80 seconds 91 92 proc cluster data=distmatrix outtree=trees 93 method=ward noprint; 94 id SEGMENT_ID; 95 run; NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set, the procedure requires that the order of the rows match the order of the variables. ERROR: Missing values are not allowed in the lower triangle of a distance data set. NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.TREES may be incomplete. When this step was stopped there were 0 observations and 11 variables. NOTE: PROCEDURE CLUSTER used (Total process time): real time 58.22 seconds cpu time 1.48 seconds 96 97 proc tree data=trees noprint out=Final n=10; 98 id SEGMENT_ID; 99 100 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; WARNING: The data set WORK.FINAL may be incomplete. When this step was stopped there were 0 observations and 3 variables. 112
I think you may have an empty line as the end of your csv file that is causing the problem.
Thank you! Because of your posts I looked at it again and found that it was not a blank row, but rather I had to specify the type/shape of the matrix. It works now!
proc distance data=roaddata2 out=distmatrix method=Euclid shape=square;
var interval (FACT_AADT/ std=std);
id SEGMENT_ID;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.