BookmarkSubscribeRSS Feed
Ruwan
Calcite | Level 5

I tried to do a cluster analysis. how ever it didn't create a dendrogram. the data set and the SAS code I used is attached herewith. could anyone tell me what's wrong with it?

2 REPLIES 2
WarrenKuhfeld
Rhodochrosite | Level 12

Many of us won't open attachments, so you will have a  better response if you post your code. 

ballardw
Super User

When I run your data step this is the Log:

 

15   data chili;
16   length Acc $6;
17   input Acc &$ StC NA SP PGH BH LC LS LP NFA FP CC AC SE MS CP CM AS FCI
17 !  FCM FS FSPA NBF FSBE FBEA FC FSr SC NSF GU PH MLL MLW PHF CWF FL FW
17 ! FWt FT FPP DFt DFl DFF TSWt FPK Yd;
18   datalines;

NOTE: Invalid data for StC in line 46 1-2.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6----
46         C2 2 7 5 7 5 3 2 3 1 7 4 3 7 0 0 2 0 3 9 4 4 0 4 1 7 3 1 2 2 38.
       65  0 6.4 2.1 30.3 27.6 4.4 2.2 94.2 0.1 35.2 55.0 59.0 79.0 3.2 373
      129  .0 35.0
Acc=C1 2 7 StC=. NA=2 SP=7 PGH=5 BH=7 LC=5 LS=3 LP=2 NFA=3 FP=1 CC=7 AC=4
SE=3 MS=7 CP=0 CM=0 AS=2 FCI=0 FCM=3 FS=9 FSPA=4 NBF=4 FSBE=0 FBEA=4 FC=1
FSr=7 SC=3 NSF=1 GU=2 PH=2 MLL=38 MLW=6.4 PHF=2.1 CWF=30.3 FL=27.6 FW=4.4
FWt=2.2 FT=94.2 FPP=0.1 DFt=35.2 DFl=55 DFF=59 TSWt=79 FPK=3.2 Yd=373
_ERROR_=1 _N_=14
NOTE: Invalid data for StC in line 48 1-2.
48         C4 2 7 7 7 3 7 1 7 1 5 8 5 7 0 0 2 1 6 9 5 3 0 4 1 3 2 1 3 2 70.
       65  7 7.5 2.8 35.7 19.2 6.4 2.1 85.0 0.2 19.3 39.0 45.0 72.0 2.9 227
      129  .0 19.0
Acc=C3 1 1 StC=. NA=2 SP=7 PGH=7 BH=7 LC=3 LS=7 LP=1 NFA=7 FP=1 CC=5 AC=8
SE=5 MS=7 CP=0 CM=0 AS=2 FCI=1 FCM=6 FS=9 FSPA=5 NBF=3 FSBE=0 FBEA=4 FC=1
FSr=3 SC=2 NSF=1 GU=3 PH=2 MLL=70.7 MLW=7.5 PHF=2.8 CWF=35.7 FL=19.2 FW=6.4
FWt=2.1 FT=85 FPP=0.2 DFt=19.3 DFl=39 DFF=45 TSWt=72 FPK=2.9 Yd=227
_ERROR_=1 _N_=15
NOTE: Invalid data for StC in line 50 1-2.
50         C6 2 7 7 7 5 7 3 7 1 7 5 5 7 0 0 3 1 6 9 1 2 0 1 1 3 2 1 3 2 60.
       65  4 7.2 2.6 32.3 18.2 7.6 1.3 169.0 0.1 70.3 34.0 39.0 72.0 2.9 41
      129  5.0 70.0
Acc=C5 2 7 StC=. NA=2 SP=7 PGH=7 BH=7 LC=5 LS=7 LP=3 NFA=7 FP=1 CC=7 AC=5
SE=5 MS=7 CP=0 CM=0 AS=3 FCI=1 FCM=6 FS=9 FSPA=1 NBF=2 FSBE=0 FBEA=1 FC=1
FSr=3 SC=2 NSF=1 GU=3 PH=2 MLL=60.4 MLW=7.2 PHF=2.6 CWF=32.3 FL=18.2 FW=7.6
FWt=1.3 FT=169 FPP=0.1 DFt=70.3 DFl=34 DFF=39 TSWt=72 FPK=2.9 Yd=415
_ERROR_=1 _N_=16
NOTE: Invalid data for StC in line 52 1-2.
52         C8 2 1 3 7 3 3 1 3 1 5 4 5 7 0 0 2 0 3 8 4 3 0 3 1 7 3 1 2 2 38.
       65  9 10.0 4.0 26.3 25.4 3.9 1.7 87.3 0.1 39.1 36.0 60.0 114.0 5.2 4
      129  00.0 39.0
Acc=C7 2 5 StC=. NA=2 SP=1 PGH=3 BH=7 LC=3 LS=3 LP=1 NFA=3 FP=1 CC=5 AC=4
SE=5 MS=7 CP=0 CM=0 AS=2 FCI=0 FCM=3 FS=8 FSPA=4 NBF=3 FSBE=0 FBEA=3 FC=1
FSr=7 SC=3 NSF=1 GU=2 PH=2 MLL=38.9 MLW=10 PHF=4 CWF=26.3 FL=25.4 FW=3.9
FWt=1.7 FT=87.3 FPP=0.1 DFt=39.1 DFl=36 DFF=60 TSWt=114 FPK=5.2 Yd=400
_ERROR_=1 _N_=17
NOTE: SAS went to a new line when INPUT statement reached past the end of
      a line.
NOTE: The data set WORK.CHILI has 19 observations and 46 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds


57   ;

With that many data errors I would not be surprised that you don't get good output.

 

One question would be why do you have an & on the input statement after reading the Acc variable?

 

However the real question comes in the proc cluster code where you use a data set from proc distance that did not include the variable COUNTRY anywhere (and not in the data set chili)

107  proc distance data=chili out=dist method=euclid;
108  var interval (StC NA SP PGH BH LC LS LP NFA FP CC AC SE MS CP CM AS
108! FCI FCM FS FSPA NBF FSBE FBEA FC FSr SC NSF GU PH MLL MLW PHF CWF FL
108! FW FWt FT FPP DFt DFl DFF TSWt FPK Yd);
109  id Acc;
110  run;

NOTE: The data set WORK.DIST has 38 observations and 39 variables.
NOTE: PROCEDURE DISTANCE used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds


111  ods graphics on;
112  proc cluster data=Dist method=ward plots=dendrogram (height=rsq);
113  id country;
ERROR: Variable COUNTRY not found.
114  run;

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.DATA1 may be incomplete.  When this step was
         stopped there were 0 observations and 0 variables.
NOTE: PROCEDURE CLUSTER used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

If you use ID ACC; in the Proc Cluster code as in the Proc Distance it should work

121  ods graphics on;
122  proc cluster data=Dist method=ward plots=dendrogram (height=rsq);
123  id acc;
124  run;

NOTE: The input data set is a TYPE=DISTANCE data set. For such a data set,
      the procedure requires that the order of the rows match the order of
      the variables.
NOTE: Writing HTML Body file: sashtml.htm
NOTE: Input distances have been squared.

 

 

 

SAS INNOVATE 2024

Innovate_SAS_Blue.png

Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.

If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website. 

Register now!

From SAS Users blog
Want more? Visit our blog for more articles like these.
5 Steps to Your First Analytics Project Using SAS

For SAS newbies, this video is a great way to get started. James Harroun walks through the process using SAS Studio for SAS OnDemand for Academics, but the same steps apply to any analytics project.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 347 views
  • 0 likes
  • 3 in conversation