BookmarkSubscribeRSS Feed
frupaul
Quartz | Level 8

Hi everyone,

In regression modelling (logistic regression and linear regression), when is it not best to transform a variable? In other words:

 

- If a variable is normally distributed but has a very large range, should it still be transformed?

- Should binary variables be transformed? if not, why?

- What other reasons are there not to transform a variable?

 

Thanks

 

Paul

 

3 REPLIES 3
art297
Opal | Level 21

I am not a statistician, so am responding based only on experience (and the stats I learned getting a PhD in Educational Psychology) and to insure that I see the responses with those with more expertise (hi  @Rick_SAS )

 

In my experience the principal reason for doing any transformation is when you have a distribution that you assume, or theory suggests, that it comes from a factor that has something other than a normal distribution.

 

I know regarding insurance claims that it holds for binary variables, as frequency of insurance claims (a binary variable: have or don't have a claim( is one such distribution. 

 

Art, CEO, AnalystFinder.com

 

fbgeoff
Calcite | Level 5

Thanks, but surely by transforming a binary variable, you will completely ruin your chances of making any meaningful interpretations from them.

 

Is the presence of outliers a good enough reason to warrant a transformation? Some variables are normally distributed but have outliers. In this case, will it still be necessary to transform the variable?

Thanks

PaigeMiller
Diamond | Level 26

@fbgeoff wrote:

 

Is the presence of outliers a good enough reason to warrant a transformation? Some variables are normally distributed but have outliers. In this case, will it still be necessary to transform the variable?


The answer is: it depends!

 

If you want your analysis to be less sensitive to outliers, then you take some action to reduce the effect of outliers — while transformation is one way to reduce the effect of outliers, it also changes the distribution of the data (which you may or may not want). A better way to reduce the effect of outliers is to run a "robust" analysis on the untransformed data, if such a "robust" analysis exists.

--
Paige Miller

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 2054 views
  • 0 likes
  • 4 in conversation