Programming the statistical procedures from SAS

Scoring new obs after clustering , need help with syntax

Reply
Frequent Contributor
Posts: 126

Scoring new obs after clustering , need help with syntax

Hi all,

I did some research but could use a confirmation.

1) Cluster procedure:

proc cluster data = /* WHICH DATA */

             method=ward /* WHAT LINKAGE */

             outtree=

  ccc pseudo print=15/* OUTPUT DATA INCLUDING SOLUTION */;

             id CHOSEN ID /*VAR THAT WILL BE CLUSTERED */;

var   /*VARS TO BE CLUSTERED */;

run;

/*Produce tree so as to see the shape of solution */

ods graphics on;

proc tree nclusters=5 /* HOW MANY CLUSTERS */

          data= /* OUTPUT DATA OF STEP 1 IS INPUT DATA IN STEP 2 */

          out=/* FINAL DATA INCLUDING SOLUTION */;

id chosen ID;

run;

ods graphics off;

proc sort data =

out=;

by Cluster;

run;

proc means data =;

output out=;

run;

proc fastclus data=<newdata>

maxclusters=<nclusters>

seed=<centroids>

maxiter=0 out=<scored>

My question here is how the proc means syntax should be cause it doesnt make sense to use the output from the tree that contains only the cluster and ID.

2) Fastclus

proc fastclus data=

outstat=

maxclusters=5;

var ;

id chosen ID;

run;

proc means data=

proc fastclus instat=

out=score data;

run;

Same here as above  regarding the proc means syntax, what should be my input?

Thnx in advance

SAS Super FREQ
Posts: 3,310

Scoring new obs after clustering , need help with syntax

1) If I understand your code, the DATA= option for PROC MEANS is the OUT= data set for PROC TREE. The OUT= dataset for PROC MEANS would be "centroids," which is the SEED= input for PROC FASTCLUS.  You are correct that there doesn't seem to be a need for the MEANS step (or the SORT?).

2) Why do you need to find means at all? Use the OUTSTAT= data set the first time, and read that back in with teh INSTAT= option in order to score the new data.

Frequent Contributor
Posts: 126

Scoring new obs after clustering , need help with syntax

Ok,

So for the second method i need the Outstat data set, that i understand.

Regarding the first method which will be my SEED dataset, the OUTTREE from PROC CLUSTER or the OUT from PROC TREE?

Thnx in advance Rick

Ask a Question
Discussion stats
  • 2 replies
  • 1109 views
  • 0 likes
  • 2 in conversation