turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Transforming non-normally distributed variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-04-2016 03:00 PM

I am trying to find the best transformation for a set of non-normally distributed continuous variables. I see that I can use PROC PRINQUAL w/ the TRANSFORM statement and select various options (e.g. Log, Exp), but is there a function or proc that will help me select the best one?

STATA has a function - ladder - that will transform variables in a multitude of ways and then present a chi-square statistic to help determine which transformation is the "best", based on the lowest chi-square statistic.

Does SAS have anything like this?

Thanks!

Accepted Solutions

Solution

05-06-2016
03:40 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2016 01:49 PM

You may want to look at PROC TRANSREG, and specifically at the Box-Cox transformation material there.

Steve Denham

All Replies

Solution

05-06-2016
03:40 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2016 01:49 PM

You may want to look at PROC TRANSREG, and specifically at the Box-Cox transformation material there.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-09-2016 05:15 PM

@SteveDenham The Box-Cox transformation in PROC TRANSREG contains a model statement w/ what looks like a dependent (Y) and indepedent (i.e. X) variable.

proc transreg data=x test; model BoxCox(y) = identity(x); run;

Pardon my ignorance, but why is the indepedent variable requried if I am just looking for a transformation of the dependent variable?

My model has a categorical indepedent variable (ANOVA) and PROC TRANSREG seems to require a continuous variable for the model statement.

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2016 02:21 PM

See the doc for Box-Cox transformations in PROC TRANSREG.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-05-2016 03:16 PM

Of course, now that a couple of us have recommended something, I can ask the important question: Why? Why do you want to transform through an "optimal transformation"? I would consider the process that generated the values to be of far greater importance in determining the distribution than a best fit. Suppose you do find an optimal transformation, but consideration of the process suggests an alternative. Which approach would likely have greater inferential utility?

In the end, if you just want the transformed data to look Gaussian, recall that there is not an assumption about the data being normally distributed in linear models (regression, ANOVA, etc.). It is all about the normality of the residuals, which is different cat altogether. And there are powerful techniques available that may not require pre-transformation of the data, if the normality of residuals assumption is not met.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-06-2016 03:40 PM

@SteveDenham Thanks for the though provoking quesiton, the reminder, and the useful explanation!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2016 03:19 PM

I'm getting an error message that there are invalid values when using the Box-Cox transformation in PROC TRANSREG. There are no missing values, no zero values and no negative values in the dataset. Any ideas?

<ERROR: 31 invalid values were encountered while attempting to transform variable var1;>

Thanks!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2016 03:48 PM

Depends on the transformation, but probably you are encountering nonpositive values for a transformation that requires positive values. For example, the log() transformation requires positive values. The square-root transformation requires nonnegative values. The inverse transformation (1/x) requires non-zero values, and so forth.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2016 03:51 PM

It looks like you have ruled out most everything, so I would start to suspect character strings where numbers are expected, especially if you imported excel data.

Steve Denham