BookmarkSubscribeRSS Feed
chemicalab
Fluorite | Level 6

Hi all,

I did some research but could use a confirmation.

1) Cluster procedure:

proc cluster data = /* WHICH DATA */

             method=ward /* WHAT LINKAGE */

             outtree=

  ccc pseudo print=15/* OUTPUT DATA INCLUDING SOLUTION */;

             id CHOSEN ID /*VAR THAT WILL BE CLUSTERED */;

var   /*VARS TO BE CLUSTERED */;

run;

/*Produce tree so as to see the shape of solution */

ods graphics on;

proc tree nclusters=5 /* HOW MANY CLUSTERS */

          data= /* OUTPUT DATA OF STEP 1 IS INPUT DATA IN STEP 2 */

          out=/* FINAL DATA INCLUDING SOLUTION */;

id chosen ID;

run;

ods graphics off;

proc sort data =

out=;

by Cluster;

run;

proc means data =;

output out=;

run;

proc fastclus data=<newdata>

maxclusters=<nclusters>

seed=<centroids>

maxiter=0 out=<scored>

My question here is how the proc means syntax should be cause it doesnt make sense to use the output from the tree that contains only the cluster and ID.

2) Fastclus

proc fastclus data=

outstat=

maxclusters=5;

var ;

id chosen ID;

run;

proc means data=

proc fastclus instat=

out=score data;

run;

Same here as above  regarding the proc means syntax, what should be my input?

Thnx in advance

2 REPLIES 2
Rick_SAS
SAS Super FREQ

1) If I understand your code, the DATA= option for PROC MEANS is the OUT= data set for PROC TREE. The OUT= dataset for PROC MEANS would be "centroids," which is the SEED= input for PROC FASTCLUS.  You are correct that there doesn't seem to be a need for the MEANS step (or the SORT?).

2) Why do you need to find means at all? Use the OUTSTAT= data set the first time, and read that back in with teh INSTAT= option in order to score the new data.

chemicalab
Fluorite | Level 6

Ok,

So for the second method i need the Outstat data set, that i understand.

Regarding the first method which will be my SEED dataset, the OUTTREE from PROC CLUSTER or the OUT from PROC TREE?

Thnx in advance Rick

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 3048 views
  • 0 likes
  • 2 in conversation