Hi,
I would like to perform a KNN procedure and being able to display the cluster on a 2-dim plot.
I 'm used to perform the k-means alg with :
Proc fastclus DATA=CORR_ACC maxclusters=8 maxiter=100 outseed=Mathis out=resultats;
VAR dim: ;
ID id ;
RUN ;
proc sgplot data=resultats noautolegend ;
scatter x=dim1 y=dim2 / Group= CLUSTER /*datalabel=CLUSTER*/ name="ACM"
legendlabel="ACM";
keylegend "ACM";
run;
Where CORR_ACC nis the output of a proc Corresp. But i'm really struggling with the KNN.
Any help for a simple way to do this ?
I would like to perform a KNN procedure and being able to display the cluster on a 2-dim plot.
Which output(s) of the KNN do you want to plot in two dimensions?
@Mathis1 wrote:
This kind : https://www.mathworks.com/help/examples/stats/win64/ClassifyingQueryDataUsingKnnsearchExample_01.png
I'm afraid you haven't answered the question. I asked "which outputs" and you showed me "what the plot should look like". So ... which outputs from the KNN do you want to cluster? Any outputs from a KNN will be stored in a SAS data set, or printed to the output. Please be specific. Show me.
Are you referring to the kth-Nearest-Neighbor method of proc cluster?
You must combine proc cluster and proc tree, like this (simplified example data from proc corresp documentation) :
title 'United States Population, 1920-1970';
data USPop;
input Region $14. y1920 y1930 y1940 y1950 y1960 y1970;
label y1920 = '1920' y1930 = '1930' y1940 = '1940'
y1950 = '1950' y1960 = '1960' y1970 = '1970';
datalines;
New England 7401 8166 8437 9314 10509 11842
NY, NJ, PA 22261 26261 27539 30146 34168 37199
Great Lakes 21476 25297 26626 30399 36225 40252
Midwest 12544 13297 13517 14061 15394 16319
South Atlantic 13990 15794 17823 21182 25972 30671
KY, TN, AL, MS 8893 9887 10778 11447 12050 12803
AR, LA, OK, TX 10242 12177 13065 14538 16951 19321
Mountain 3336 3702 4150 5075 6855 8282
Pacific 5567 8195 9733 14486 20339 25454
;
* Perform Simple Correspondence Analysis;
proc corresp data=uspop out=dims plots=none;
var y1920 -- y1970;
id Region;
run;
proc cluster data=dims method=density k=3 outtree=tree;
where _type_ = "OBS";
var dim1 dim2;
id region;
run;
proc tree data=tree nclusters=3 out=treegraph noprint;
copy dim1 dim2;
run;
proc sgplot data=treegraph noautolegend ;
scatter x=dim1 y=dim2 / Group= CLUSTER datalabel=_name_
name="ACM" legendlabel="ACM";
keylegend "ACM" / title="Cluster";
run;
Hello PG and thank you very much for your reply 🙂
I had tried to do something like this, but the issue i have is that i get way too many clusters, even when specifying "nclusters=5" in the proc tree.
Please look at the table "TreeGraph", you'll se there are at least 70 Clusters :
Do you have any idea to remedy this ?
There are exactly 5 clusters defined in that dataset, they are identified as : 70, 3, 71, 59 and 22.
Look, this is the table
I am not sure that meaningful clusters can be defined on these coordinates. I suspect that the problem you encountered with kth-Nearest-Neighbor method is due to ties, a problem which is discussed in the documentation but that I don't fully understand.
Anyway, increasing the number of neighbors (k) can bring the number of indentifyable clusters down, but I doubt this is very useful. For reference, starting from your dim1 and dim2 values and k=36 :
proc cluster data=sasforum.knn method=density k=36 outtree=tree plots=none;
var dim1 dim2;
run;
proc tree data=tree nclusters=5 out=treegraph noprint;
copy dim1 dim2;
run;
proc sql;
select cluster, count(*) as n
from treegraph
group by cluster;
quit;
proc sgplot data=treegraph;
scatter x=dim1 y=dim2 / Group=CLUSTER;
run;
CLUSTER n 1 1080 2 124 3 1 4 1 5 1
It is hard to identify visually how meaningful clusters would look like for these data.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.