Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: Normalize A Variable

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 06-08-2014 07:00 PM
(7208 views)

If a variable has value ranging from 5000 to -5000, how does normalizing it to the range of 1 and -1 affect the regression model result? Will this variable become less influential?

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

how does normalizing it to the range of 1 and -1 affect the regression model result? Will this variable become less influential?

If you are talking about ordinary least squares regression, then changing the units like this will not make the variable "less influential". If you are fitting the model with some other fitting technique (like for example partial least squares regression) then it is possible that this will make the variable "less influential".

At least, that's for my definition of "influential". Maybe you have a different definition? What do you mean by "influential"? Is there a standard mathematical description that you can provide for the meaning of "influential"? My definition would be the same sum of squares, in which case there should be no change to the sum of squares if you rescale an independent variable in ordinary least squares regression.

--

Paige Miller

Paige Miller

9 REPLIES 9

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi,

Do you want to standardize variable using range?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Okay, you can use this syntax to standardize variable. Independent variables in regression usually become in different units (kgs, meters etc.) so we convert them into comaptible units by stadardizing them to have mean equal to zero and standard deviation 1. Hope this helps.

proc stdize data=have out=want method=range;

var var;

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Are you normalizing dependent variable or independent variables?

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

how does normalizing it to the range of 1 and -1 affect the regression model result? Will this variable become less influential?

If you are talking about ordinary least squares regression, then changing the units like this will not make the variable "less influential". If you are fitting the model with some other fitting technique (like for example partial least squares regression) then it is possible that this will make the variable "less influential".

At least, that's for my definition of "influential". Maybe you have a different definition? What do you mean by "influential"? Is there a standard mathematical description that you can provide for the meaning of "influential"? My definition would be the same sum of squares, in which case there should be no change to the sum of squares if you rescale an independent variable in ordinary least squares regression.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

And to carry this further, are you simply rescaling, or are you in fact normalizing. In the first, you would just divide the values by 5000. In the second, since the values are already centered on zero (I assume), you would divide the values by the standard deviation. As Paige points out, this should have no effect on the "influence" of the variable, EXCEPT...

If this is a multiple regression, where there are several independent variables, and you normalize only one of them, it's pretty obvious that you are going to change the (partial) covariance between that variable and the dependent variable. If you normalize all of the variables, influence on the dependent variable is more a matter of changes in standard deviation units rather than on the original scale.

Short answer: Normalizing MAY (and probably WILL) change influence as measured by partial R squared, or by change in IC. Rescaling should NOT change influence.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

It would help the discussion if aha123 would state the actual model used, how many independent variables are there, also are there interactions or quadratic or other polynomial terms (or is it a nonlinear model), and it would further help to know if we are discussing ordinary least squares or some other model fitting technique.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Good call--REG, PLS, ROBUSTREG, QUANTREG and MIXED may all give different results, as they use five different algorithms for optimizing the function. And that is just for a linear function.

Steve Denham

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.