BookmarkSubscribeRSS Feed
Data_Guy
Calcite | Level 5

Hi Experts,

 

After creating 5 cluster groups (using k-Means Algorithm) from my data set based on 4 continuous variables, I was wondering if it is valid to use the cluster group ids (1 to 5) as an dependent variable in a multinomial logistic regression (using the same 4 continuous variables in the clustering algorithm as independent variables) to predict the cluster groups of new observations (with the same 4 continuous variables)?  Note that data for 3 of my 4 independent variables are highly skewed.

 

If the above method is valid, not sure which other types of classifiers (i.e. KNN, Decision Trees, SVMs, etc.) would be best to predict cluster group for new observations.

 

Thanks much!

 

3 REPLIES 3
stat_sas
Ammonite | Level 13

Hi,

 

This can be used but you can predict cluster membership of new observation based on it's distance from the closest cluster center..

cj_blake
SAS Employee

I know that this is an old post but I found it during my own searching on this topic and wanted to provide the solution that I used so that it might help others. There is really good documentation on this but it is in different documents and this seems like a good place to link it all together.

 

A quick note before the solution below: I am working with data in CAS on SAS Viya but the below should be relevant for Viya 3.x if you're working on that version too. The general concept is valid also SAS9 (I believe), but the steps will be significantly different.

 

Similarly to you, I have performed a k-Means clustering on some data and I wanted to generate a model that can help me to put new records into the appropriate clusters that I have already created. Using this example from the documentation, we can simply add the following additional option just above the run statement of our proc cas call to save a model in CAS as an ASTORE which can then be used to "score" other records:

 

 saveState={name="PetalModel", replace=True}

 

From this, I can use the aStore action set to pass one or more records into my model:

 

 proc cas;
   loadactionset "aStore";
   action aStore.score /
     table={name='NEWPETALS'},
     out={name='SCORED_NEWPETALS'},
     rstore={name='PetalModel'};
 run;
 quit;

 

For more information about your model including checking what the input data shape should be and what you can expect to get in the output table, you can use the describe method:

 

 proc cas;
    aStore.describe rstore={name='PetalModel'}, epcode=TRUE;
    run;
 quit;

 

Hope that helps!

sbxkoenk
SAS Super FREQ

Usage Note 22544: Assigning new observations to clusters defined using previous data
https://support.sas.com/kb/22/544.html

 

Koen

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2309 views
  • 1 like
  • 4 in conversation