BookmarkSubscribeRSS Feed
chemicalab
Fluorite | Level 6

Hi all,

I did some research but could use a confirmation.

1) Cluster procedure:

proc cluster data = /* WHICH DATA */

             method=ward /* WHAT LINKAGE */

             outtree=

  ccc pseudo print=15/* OUTPUT DATA INCLUDING SOLUTION */;

             id CHOSEN ID /*VAR THAT WILL BE CLUSTERED */;

var   /*VARS TO BE CLUSTERED */;

run;

/*Produce tree so as to see the shape of solution */

ods graphics on;

proc tree nclusters=5 /* HOW MANY CLUSTERS */

          data= /* OUTPUT DATA OF STEP 1 IS INPUT DATA IN STEP 2 */

          out=/* FINAL DATA INCLUDING SOLUTION */;

id chosen ID;

run;

ods graphics off;

proc sort data =

out=;

by Cluster;

run;

proc means data =;

output out=;

run;

proc fastclus data=<newdata>

maxclusters=<nclusters>

seed=<centroids>

maxiter=0 out=<scored>

My question here is how the proc means syntax should be cause it doesnt make sense to use the output from the tree that contains only the cluster and ID.

2) Fastclus

proc fastclus data=

outstat=

maxclusters=5;

var ;

id chosen ID;

run;

proc means data=

proc fastclus instat=

out=score data;

run;

Same here as above  regarding the proc means syntax, what should be my input?

Thnx in advance

2 REPLIES 2
Rick_SAS
SAS Super FREQ

1) If I understand your code, the DATA= option for PROC MEANS is the OUT= data set for PROC TREE. The OUT= dataset for PROC MEANS would be "centroids," which is the SEED= input for PROC FASTCLUS.  You are correct that there doesn't seem to be a need for the MEANS step (or the SORT?).

2) Why do you need to find means at all? Use the OUTSTAT= data set the first time, and read that back in with teh INSTAT= option in order to score the new data.

chemicalab
Fluorite | Level 6

Ok,

So for the second method i need the Outstat data set, that i understand.

Regarding the first method which will be my SEED dataset, the OUTTREE from PROC CLUSTER or the OUT from PROC TREE?

Thnx in advance Rick

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 2838 views
  • 0 likes
  • 2 in conversation