Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Transformed Target Variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Transformed Target Variables

I am experimenting with using the Transform Variables node to transform the target variable.  In this case, I have an interval target variable, but I hypothesize that I want to model the log of the variable.

Here is my issue: once I am done, my goal is to create score code and then export the model to use in-database with Scoring Accelerator.  The problem is that once I've transformed the target, EM_PREDICTION will be the log of my target variable, rather than the actual target.  That means my scoring function will also be transformed.

Is there a way to let EM know that you want to transform your target for the purpose of fitting your model, but undo the transformation for scoring?  It seems like this is what you would always want to have happen, anyway.


Accepted Solutions
Solution
Wednesday
SAS Employee
Posts: 121

Re: Transformed Target Variables

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable.  It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log.  For example, suppose a variable takes on values greater than or equal to zero.   As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity.   For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

 

     new_target = log (target + 1)

 

which yields values in the range from log(1) and up.  Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive.   Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target.  These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made.  SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made.  You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

 

Note that if your training data ranges from -99 to infinity, the adjustment would be 

 

     new_target = log (target + 100)

 

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring.  For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

 

      new_target = log (target +13)

 

which still leads the undefined value of

 

      new_target = log (-15 + 13) = log (-2)

 

for the observation in the scoring data.   As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.  

 

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable.   Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.  

 

I hope this helps!

Doug

View solution in original post


All Replies
Solution
Wednesday
SAS Employee
Posts: 121

Re: Transformed Target Variables

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable.  It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log.  For example, suppose a variable takes on values greater than or equal to zero.   As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity.   For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

 

     new_target = log (target + 1)

 

which yields values in the range from log(1) and up.  Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive.   Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target.  These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made.  SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made.  You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

 

Note that if your training data ranges from -99 to infinity, the adjustment would be 

 

     new_target = log (target + 100)

 

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring.  For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

 

      new_target = log (target +13)

 

which still leads the undefined value of

 

      new_target = log (-15 + 13) = log (-2)

 

for the observation in the scoring data.   As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.  

 

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable.   Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.  

 

I hope this helps!

Doug

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 220 views
  • 0 likes
  • 2 in conversation