12-03-2011 10:37 AM
I did some research but could use a confirmation.
1) Cluster procedure:
proc cluster data = /* WHICH DATA */
method=ward /* WHAT LINKAGE */
ccc pseudo print=15/* OUTPUT DATA INCLUDING SOLUTION */;
id CHOSEN ID /*VAR THAT WILL BE CLUSTERED */;
var /*VARS TO BE CLUSTERED */;
/*Produce tree so as to see the shape of solution */
ods graphics on;
proc tree nclusters=5 /* HOW MANY CLUSTERS */
data= /* OUTPUT DATA OF STEP 1 IS INPUT DATA IN STEP 2 */
out=/* FINAL DATA INCLUDING SOLUTION */;
id chosen ID;
ods graphics off;
proc sort data =
proc means data =;
proc fastclus data=<newdata>
My question here is how the proc means syntax should be cause it doesnt make sense to use the output from the tree that contains only the cluster and ID.
proc fastclus data=
id chosen ID;
proc means data=
proc fastclus instat=
Same here as above regarding the proc means syntax, what should be my input?
Thnx in advance
12-04-2011 09:57 PM
1) If I understand your code, the DATA= option for PROC MEANS is the OUT= data set for PROC TREE. The OUT= dataset for PROC MEANS would be "centroids," which is the SEED= input for PROC FASTCLUS. You are correct that there doesn't seem to be a need for the MEANS step (or the SORT?).
2) Why do you need to find means at all? Use the OUTSTAT= data set the first time, and read that back in with teh INSTAT= option in order to score the new data.
12-05-2011 04:01 AM
So for the second method i need the Outstat data set, that i understand.
Regarding the first method which will be my SEED dataset, the OUTTREE from PROC CLUSTER or the OUT from PROC TREE?
Thnx in advance Rick