topic SEMMA Cluster Analysis in SAS Data Science

SEMMA Cluster Analysis

SAS_ASS — Fri, 18 Jan 2019 08:35:49 GMT

Hey,

I'm doing a cluster analysis on a large data set in SAS EM.

I would like to use the SEMMA approach for data mining.

But according to this approach cluster analysis is part of Explore.

But actually this is my model I guess.

My nodes are:

Input Data

Stat Explore

Drop

Filter

Impute

Data Partition

Varclus

Cluster

Score/Assess

Maybe anyone can tell me the right order?

Thank you!

Re: SEMMA Cluster Analysis

MikeStockstill — Mon, 28 Jan 2019 14:58:39 GMT

Good morning-

The "right" order depends on what you are trying to do. The flow that you describe explores the data, drops some variables, filters some observations, imputes missing values, partitions, clusters variables, and then clusters observations based on the results of the variable clustering. If this strategy is your intent, then the order is probably right. It is not clear though how assessment is involved.

SEMMA is usually involved when you have a target variable. The target predictions can be assessed against the target values that were observed. In cluster analysis, there is no target variable. Instead, unsupervised learning is performed.

Have a good week.