BookmarkSubscribeRSS Feed
frupaul
Quartz | Level 8

Hi everyone,

In regression modelling (logistic regression and linear regression), when is it not best to transform a variable? In other words:

 

- If a variable is normally distributed but has a very large range, should it still be transformed?

- Should binary variables be transformed? if not, why?

- What other reasons are there not to transform a variable?

 

Thanks

 

Paul

 

3 REPLIES 3
art297
Opal | Level 21

I am not a statistician, so am responding based only on experience (and the stats I learned getting a PhD in Educational Psychology) and to insure that I see the responses with those with more expertise (hi  @Rick_SAS )

 

In my experience the principal reason for doing any transformation is when you have a distribution that you assume, or theory suggests, that it comes from a factor that has something other than a normal distribution.

 

I know regarding insurance claims that it holds for binary variables, as frequency of insurance claims (a binary variable: have or don't have a claim( is one such distribution. 

 

Art, CEO, AnalystFinder.com

 

fbgeoff
Calcite | Level 5

Thanks, but surely by transforming a binary variable, you will completely ruin your chances of making any meaningful interpretations from them.

 

Is the presence of outliers a good enough reason to warrant a transformation? Some variables are normally distributed but have outliers. In this case, will it still be necessary to transform the variable?

Thanks

PaigeMiller
Diamond | Level 26

@fbgeoff wrote:

 

Is the presence of outliers a good enough reason to warrant a transformation? Some variables are normally distributed but have outliers. In this case, will it still be necessary to transform the variable?


The answer is: it depends!

 

If you want your analysis to be less sensitive to outliers, then you take some action to reduce the effect of outliers — while transformation is one way to reduce the effect of outliers, it also changes the distribution of the data (which you may or may not want). A better way to reduce the effect of outliers is to run a "robust" analysis on the untransformed data, if such a "robust" analysis exists.

--
Paige Miller

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1232 views
  • 0 likes
  • 4 in conversation