BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jlevine
Fluorite | Level 6

I am experimenting with using the Transform Variables node to transform the target variable.  In this case, I have an interval target variable, but I hypothesize that I want to model the log of the variable.

Here is my issue: once I am done, my goal is to create score code and then export the model to use in-database with Scoring Accelerator.  The problem is that once I've transformed the target, EM_PREDICTION will be the log of my target variable, rather than the actual target.  That means my scoring function will also be transformed.

Is there a way to let EM know that you want to transform your target for the purpose of fitting your model, but undo the transformation for scoring?  It seems like this is what you would always want to have happen, anyway.

1 ACCEPTED SOLUTION

Accepted Solutions
DougWielenga
SAS Employee

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable.  It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log.  For example, suppose a variable takes on values greater than or equal to zero.   As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity.   For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

 

     new_target = log (target + 1)

 

which yields values in the range from log(1) and up.  Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive.   Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target.  These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made.  SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made.  You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

 

Note that if your training data ranges from -99 to infinity, the adjustment would be 

 

     new_target = log (target + 100)

 

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring.  For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

 

      new_target = log (target +13)

 

which still leads the undefined value of

 

      new_target = log (-15 + 13) = log (-2)

 

for the observation in the scoring data.   As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.  

 

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable.   Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.  

 

I hope this helps!

Doug

View solution in original post

1 REPLY 1
DougWielenga
SAS Employee

If you have modeled the log of your target variable rather than the target variable itself, the score that is created from the modeling node will predict the log of the target variable.  It is relatively easy to obtain the target value by exponentiating the prediction but you need to take into account whether or not there was an initial adjustment prior to taking the log.  For example, suppose a variable takes on values greater than or equal to zero.   As the value of a non-negative variable gets closer to zero, the log of that value approaches negative infinity.   For this reason, SAS Enterprise Miner will add 1 to the target value in this example so that the formula becomes

 

     new_target = log (target + 1)

 

which yields values in the range from log(1) and up.  Since log(1)=0, the values of the predicted target (log(target+1)) will always be positive.   Once you exponeniate, you are left with an estimate of (target + 1) so you still need to subtract 1 from the exponentiated value to obtain the estimate of the target.  These are of course trivial calculations that can be easily implemented but it underscores the need to know what adjustment was actually made.  SAS Enterprise Miner cannot anticipate whether you really want the target value or the log of the target value so no reverse adjustment is made.  You could code it manually if you are deploying SAS code but the use of Scoring Accelerator makes it more likely you will need to do that separately.

 

Note that if your training data ranges from -99 to infinity, the adjustment would be 

 

     new_target = log (target + 100)

 

and if you have a training data set whose most negative value is not as low as it possibly could be, the adjustment might be inadequate for some of the observations you are scoring.  For instance, if delinquency in the training data reaches 12 months (let's say this is coded as -12) but the scoring data has values reaching 15 months of delinqueny (-15 in my example), the adjustment based on the training data would be

 

      new_target = log (target +13)

 

which still leads the undefined value of

 

      new_target = log (-15 + 13) = log (-2)

 

for the observation in the scoring data.   As a result, you must be careful to make sure you have an adequate adjustment prior to modeling and you need to make sure you back-transform the target taking the correct adjustment into consideration.  

 

One last comment -- the 'optimal solution' for the transformed target variable does not necessarily translate back to the 'optimal solution' had you modeled the non-transformed target variable.   Transformations are often done with regression models due to their inherent lack of flexibility but you might be better off considering a modeling method which is more flexible for all of the reasons noted above.  

 

I hope this helps!

Doug

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 1745 views
  • 0 likes
  • 2 in conversation