Solved
Contributor
Posts: 49

# Euclidean length option for Standardization method in Standardize Data Task (SAS Studio 3.5)

While working with the Standardize Data Task in SAS Studio 3.5 I've come across 'Euclidean length' as a standardisation method. I understand how z-scores are obtained by subtracting mean from each observation and dividing the result by standard deviation. What is Euclidean distance and how does it help in standardisation?

For example, if we are using the sashelp.baseball dataset, what would using the 'Euclidean length' method of standardisation for the 'nhits' (number of hits) variable do for us?

Accepted Solutions
Solution
‎08-12-2016 10:38 AM
SAS Super FREQ
Posts: 3,634

## Re: Euclidean length option for Standardization method in Standardize Data Task (SAS Studio 3.5)

A SAS Studio task generates SAS code, usually in the form of a call to a SAS procedure.  If you click on the Code tab, you can see the program.  In this case, the call is to PROC STDIZE and the METHOD=EUCLEN option is specified.

So the general way to answer the question "What does a task do?" is to

1. Go to the SAS/STAT User's Guide documentation.

2. Scroll down and click on the doc for the relevant procedure.

For this question here is a link to the formulas that are applied for each method. For the EUCLEN option, the location is 0 and the scale is the Euclidean length of the variable:

scale = sqrt(ssq(x)) = sqrt( x1**2 + x2**2 + ... + xN**2 ),

where N is thenum ber of observations in the sample.

The new variable is therefore

X_New[i] = (X[i] - 0) / scale

The transformation has the property that the new variable has unit Euclidean length. Geometrically, you can think of the transformation as a projection onto the surface of the unit N-dimensional sphere. This transformation might be useful for spherically symmetric problems in which the angle that the observation makes with the origin is important.

All Replies
Solution
‎08-12-2016 10:38 AM
SAS Super FREQ
Posts: 3,634

## Re: Euclidean length option for Standardization method in Standardize Data Task (SAS Studio 3.5)

A SAS Studio task generates SAS code, usually in the form of a call to a SAS procedure.  If you click on the Code tab, you can see the program.  In this case, the call is to PROC STDIZE and the METHOD=EUCLEN option is specified.

So the general way to answer the question "What does a task do?" is to

1. Go to the SAS/STAT User's Guide documentation.

2. Scroll down and click on the doc for the relevant procedure.

For this question here is a link to the formulas that are applied for each method. For the EUCLEN option, the location is 0 and the scale is the Euclidean length of the variable:

scale = sqrt(ssq(x)) = sqrt( x1**2 + x2**2 + ... + xN**2 ),

where N is thenum ber of observations in the sample.

The new variable is therefore

X_New[i] = (X[i] - 0) / scale

The transformation has the property that the new variable has unit Euclidean length. Geometrically, you can think of the transformation as a projection onto the surface of the unit N-dimensional sphere. This transformation might be useful for spherically symmetric problems in which the angle that the observation makes with the origin is important.

Contributor
Posts: 49

## Re: Euclidean length option for Standardization method in Standardize Data Task (SAS Studio 3.5)

Hi @Rick_SAS, thanks very much for the detailed explanation. I will read through the documentation for the PROC STDIZE procedure for a better understanding.

Am I correct in assuming then that a transformation using the Euclidean Length would only be used for scientific / mathematical data and cannot be used in domains like marketing? If this is incorrect would there be an example from the marketing / business domain that you can point me to in which this transformation is used to analyse data and generate insight?

SAS Super FREQ
Posts: 3,634

## Re: Euclidean length option for Standardization method in Standardize Data Task (SAS Studio 3.5)

I am not familiar with marketing, so I can't answer your question. However, I will say that the METHOD=EUCLEN is more geeky/scientific than the more intuitive standard deviation.

It's not that strange, though. If your data are centered, then the formula for the standard deviation is closely related to the Euclidean length.  The Euclidean length is sqrt(N-1) times longer than the standard deviation.

☑ This topic is solved.