turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Data Mining
- /
- Transformed Target Variables

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

11-30-2012 10:04 AM

I am experimenting with using the Transform Variables node to transform the target variable. In this case, I have an interval target variable, but I hypothesize that I want to model the log of the variable.

Here is my issue: once I am done, my goal is to create score code and then export the model to use in-database with Scoring Accelerator. The problem is that once I've transformed the target, EM_PREDICTION will be the log of my target variable, rather than the actual target. That means my scoring function will also be transformed.

Is there a way to let EM know that you want to transform your target for the purpose of fitting your model, but undo the transformation for scoring? It seems like this is what you would always want to have happen, anyway.

Accepted Solutions

Solution

Wednesday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Wednesday

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable. It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log. For example, suppose a variable takes on values greater than or equal to zero. As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity. For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

new_target = log (target + 1)

which yields values in the range from log(1) and up. Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive. Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target. These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made. SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made. You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

Note that if your training data ranges from -99 to infinity, the adjustment would be

new_target = log (target + 100)

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring. For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

new_target = log (target +13)

which still leads the undefined value of

new_target = log (-15 + 13) = log (-2)

for the observation in the scoring data. As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable. Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.

I hope this helps!

Doug

All Replies

Solution

Wednesday

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Wednesday

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable. It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log. For example, suppose a variable takes on values greater than or equal to zero. As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity. For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

new_target = log (target + 1)

which yields values in the range from log(1) and up. Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive. Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target. These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made. SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made. You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

Note that if your training data ranges from -99 to infinity, the adjustment would be

new_target = log (target + 100)

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring. For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

new_target = log (target +13)

which still leads the undefined value of

new_target = log (-15 + 13) = log (-2)

for the observation in the scoring data. As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable. Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.

I hope this helps!

Doug