BookmarkSubscribeRSS Feed
Ruwan
Calcite | Level 5

I tried to do a cluster analysis. how ever it didn't create a dendrogram. the data set and the SAS code I used is attached herewith. could anyone tell me what's wrong with it?

2 REPLIES 2
WarrenKuhfeld
Ammonite | Level 13

Many of us won't open attachments, so you will have a  better response if you post your code. 

ballardw
Super User

When I run your data step this is the Log:

 

15   data chili;
16   length Acc $6;
17   input Acc &$ StC NA SP PGH BH LC LS LP NFA FP CC AC SE MS CP CM AS FCI
17 !  FCM FS FSPA NBF FSBE FBEA FC FSr SC NSF GU PH MLL MLW PHF CWF FL FW
17 ! FWt FT FPP DFt DFl DFF TSWt FPK Yd;
18   datalines;

NOTE: Invalid data for StC in line 46 1-2.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----
46         C2 2 7 5 7 5 3 2 3 1 7 4 3 7 0 0 2 0 3 9 4 4 0 4 1 7 3 1 2 2 38.
       65  0 6.4 2.1 30.3 27.6 4.4 2.2 94.2 0.1 35.2 55.0 59.0 79.0 3.2 373
      129  .0 35.0
Acc=C1 2 7 StC=. NA=2 SP=7 PGH=5 BH=7 LC=5 LS=3 LP=2 NFA=3 FP=1 CC=7 AC=4
SE=3 MS=7 CP=0 CM=0 AS=2 FCI=0 FCM=3 FS=9 FSPA=4 NBF=4 FSBE=0 FBEA=4 FC=1
FSr=7 SC=3 NSF=1 GU=2 PH=2 MLL=38 MLW=6.4 PHF=2.1 CWF=30.3 FL=27.6 FW=4.4
FWt=2.2 FT=94.2 FPP=0.1 DFt=35.2 DFl=55 DFF=59 TSWt=79 FPK=3.2 Yd=373
_ERROR_=1 _N_=14
NOTE: Invalid data for StC in line 48 1-2.
48         C4 2 7 7 7 3 7 1 7 1 5 8 5 7 0 0 2 1 6 9 5 3 0 4 1 3 2 1 3 2 70.
       65  7 7.5 2.8 35.7 19.2 6.4 2.1 85.0 0.2 19.3 39.0 45.0 72.0 2.9 227
      129  .0 19.0
Acc=C3 1 1 StC=. NA=2 SP=7 PGH=7 BH=7 LC=3 LS=7 LP=1 NFA=7 FP=1 CC=5 AC=8
SE=5 MS=7 CP=0 CM=0 AS=2 FCI=1 FCM=6 FS=9 FSPA=5 NBF=3 FSBE=0 FBEA=4 FC=1
FSr=3 SC=2 NSF=1 GU=3 PH=2 MLL=70.7 MLW=7.5 PHF=2.8 CWF=35.7 FL=19.2 FW=6.4
FWt=2.1 FT=85 FPP=0.2 DFt=19.3 DFl=39 DFF=45 TSWt=72 FPK=2.9 Yd=227
_ERROR_=1 _N_=15
NOTE: Invalid data for StC in line 50 1-2.
50         C6 2 7 7 7 5 7 3 7 1 7 5 5 7 0 0 3 1 6 9 1 2 0 1 1 3 2 1 3 2 60.
       65  4 7.2 2.6 32.3 18.2 7.6 1.3 169.0 0.1 70.3 34.0 39.0 72.0 2.9 41
      129  5.0 70.0
Acc=C5 2 7 StC=. NA=2 SP=7 PGH=7 BH=7 LC=5 LS=7 LP=3 NFA=7 FP=1 CC=7 AC=5
SE=5 MS=7 CP=0 CM=0 AS=3 FCI=1 FCM=6 FS=9 FSPA=1 NBF=2 FSBE=0 FBEA=1 FC=1
FSr=3 SC=2 NSF=1 GU=3 PH=2 MLL=60.4 MLW=7.2 PHF=2.6 CWF=32.3 FL=18.2 FW=7.6
FWt=1.3 FT=169 FPP=0.1 DFt=70.3 DFl=34 DFF=39 TSWt=72 FPK=2.9 Yd=415
_ERROR_=1 _N_=16
NOTE: Invalid data for StC in line 52 1-2.
52         C8 2 1 3 7 3 3 1 3 1 5 4 5 7 0 0 2 0 3 8 4 3 0 3 1 7 3 1 2 2 38.
       65  9 10.0 4.0 26.3 25.4 3.9 1.7 87.3 0.1 39.1 36.0 60.0 114.0 5.2 4
      129  00.0 39.0
Acc=C7 2 5 StC=. NA=2 SP=1 PGH=3 BH=7 LC=3 LS=3 LP=1 NFA=3 FP=1 CC=5 AC=4
SE=5 MS=7 CP=0 CM=0 AS=2 FCI=0 FCM=3 FS=8 FSPA=4 NBF=3 FSBE=0 FBEA=3 FC=1
FSr=7 SC=3 NSF=1 GU=2 PH=2 MLL=38.9 MLW=10 PHF=4 CWF=26.3 FL=25.4 FW=3.9
FWt=1.7 FT=87.3 FPP=0.1 DFt=39.1 DFl=36 DFF=60 TSWt=114 FPK=5.2 Yd=400
_ERROR_=1 _N_=17
NOTE: SAS went to a new line when INPUT statement reached past the end of
      a line.
NOTE: The data set WORK.CHILI has 19 observations and 46 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


57   ;

With that many data errors I would not be surprised that you don't get good output.

 

One question would be why do you have an & on the input statement after reading the Acc variable?

 

However the real question comes in the proc cluster code where you use a data set from proc distance that did not include the variable COUNTRY anywhere (and not in the data set chili)

107  proc distance data=chili out=dist method=euclid;
108  var interval (StC NA SP PGH BH LC LS LP NFA FP CC AC SE MS CP CM AS
108! FCI FCM FS FSPA NBF FSBE FBEA FC FSr SC NSF GU PH MLL MLW PHF CWF FL
108! FW FWt FT FPP DFt DFl DFF TSWt FPK Yd);
109  id Acc;
110  run;

NOTE: The data set WORK.DIST has 38 observations and 39 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds


111  ods graphics on;
112  proc cluster data=Dist method=ward plots=dendrogram (height=rsq);
113  id country;
ERROR: Variable COUNTRY not found.
114  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DATA1 may be incomplete.  When this step was
         stopped there were 0 observations and 0 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

If you use ID ACC; in the Proc Cluster code as in the Proc Distance it should work

121  ods graphics on;
122  proc cluster data=Dist method=ward plots=dendrogram (height=rsq);
123  id acc;
124  run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set,
      the procedure requires that the order of the rows match the order of
      the variables.
NOTE: Writing HTML Body file: sashtml.htm
NOTE: Input distances have been squared.

 

 

 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 868 views
  • 0 likes
  • 3 in conversation