turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- SAS miner internal standardization property

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

04-17-2017 04:49 PM

Hello everyone,

I have a clustering case i am working on. Before applying the cluster node i have a transformation node which takes the log of all variables to be used in the cluster node.

However, the cluster node itself has a internal standardization property which can be set to none,range or standardization. My question is if i already have somewhat normally distributed data from the log transformation then should this be set to None? if not, then how do i figure if range or standardization is the way to go.

I am using only interval variables for this analysis. Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to hassan_masood90

04-19-2017 01:31 PM

Hi,

- Even in cases that we have a normal distributed data as the input to clustering, we can still set some standardization on it. For example, in the case that the input follows a normal distribution with mean \mu and standard deviation \sigma, and for the standardization we choose 'std', then the input is converted to (still) a normal distribution with mean 0 and standard deviation 1.

- To set the standarization as 'std' or 'range' results in different outputs. 'std' is to remove the mean and divide by the standard deviation of the data; 'range' is to remove the minimum and devide by the range (max - min), so 'range' will convert all the input values to non-negative.

- Both the 2 ways of standardization, 'std' and 'range', are linear transforms. They don't change the clusters structure in the data when an Euclidean distance is in use.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to YingjianWang

04-19-2017 02:40 PM

Hi,

Thank you for that response. The distinction between normalization and standardization is more clear with your answer.

In your answer you said the 'std' or 'range' results in different outputs. You also said that since both are linear transformation it doesnt change the structure of the clusters when Euclidean distance is in use.

So what's the different output if the clusters dont change?

And how would one generally decide which internal standardization method is the best one for a particular dataset?

Thank you for that response. The distinction between normalization and standardization is more clear with your answer.

In your answer you said the 'std' or 'range' results in different outputs. You also said that since both are linear transformation it doesnt change the structure of the clusters when Euclidean distance is in use.

So what's the different output if the clusters dont change?

And how would one generally decide which internal standardization method is the best one for a particular dataset?