BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NicolasC
Fluorite | Level 6

Hi

I am curious about the criteria used by Model Comparison to choose the best model. I have a interval-scaled target so I choose as a criteria the average squarred error between my different models (5 of them). I was wondering whether the time required to compute the different algorithms is taken into account by Model Comparison. If I have a model that has a average squared error slightly larger than another one but runs 10 times faster, will it still be selected as second?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
CatTruxillo
SAS Employee

If it has the smallest ASE, it is chosen as the best, regardless of how long it took to fit the model.

 

This is usually fine because time it took to fit the model isn't the same as the time it takes to score the model. If you wanted to use a practicality factor to override the ASE selection, you might look at, for example, how complex the score code has to be (for example, did that model require imputing, transforming, standardizing, replacing values, using 10 times more variables while another one just took raw input variables), or whether it can be embedded in the database directly.

But, the simple answer to the question is, if you ask the Model Comparison node to select the best based on ASE, it will select the model with the best ASE. If you want to select a different one for another reason, then direct your arrow right past the Model Comparison node and go directly from your preferred model to the Score node.

 

Good question -- I hope that helps!

Cat

View solution in original post

7 REPLIES 7
CatTruxillo
SAS Employee

If it has the smallest ASE, it is chosen as the best, regardless of how long it took to fit the model.

 

This is usually fine because time it took to fit the model isn't the same as the time it takes to score the model. If you wanted to use a practicality factor to override the ASE selection, you might look at, for example, how complex the score code has to be (for example, did that model require imputing, transforming, standardizing, replacing values, using 10 times more variables while another one just took raw input variables), or whether it can be embedded in the database directly.

But, the simple answer to the question is, if you ask the Model Comparison node to select the best based on ASE, it will select the model with the best ASE. If you want to select a different one for another reason, then direct your arrow right past the Model Comparison node and go directly from your preferred model to the Score node.

 

Good question -- I hope that helps!

Cat

NicolasC
Fluorite | Level 6

Thanks very much for your answer. It does help. I was curious about it. 

I am actually comparing 5 models, and for each of them, I have 2 different variables (interval scaled) transformation. Firt one is Log (inputs+target), the other one is std (inputs+target). So I have a total of 10 outputs I would like to compare. If my selection criteria is ASE, then the Model Comparison node will choose the lowest ASE and this calculation will be biased by the fact that one is calculated on STD_variables and the other one in LOG_variables. I suspect the "STD" calculated ASE would be lower and one of the model after this transformation will be picked, however this is not related to the fitting of the model per-se. Do you have any thoughts about that? Many thanks!!   

CatTruxillo
SAS Employee

If I follow your explanation correctly, you have tried two different transformations (either log or standardization) on both the inputs AND the response? In that case, the manual transformations of the target variable make it very tricky to compare those models directly because of the trouble with back-transformation to the original scale.

 

Going to discuss in terms of a regression model here, for simplicity's sake.

 

Taking the log stretches and squishes the distribution, regularizing it so that cases way out at the ends have less leverage, giving cases nearer the center of mass more influence on the slope estimate than they would have in the raw scale, using OLS estimates. The interpretation of the log-scale estimates is quite different from the interpretation of the raw or STD estimates, and requires back-transformation and preferably an adjustment for bias.I'd prefer to use a link function in a generalized linear model instead of logging the response, because then the predictions and estimates can be obtained in the original scale and are trustworthy for model comparisons.

 

The STD is a useful trick for comparing different parameter estimates in terms of SD units, when the predictors have different variances. If you apply a generalized linear model, you can still obtain standardized weights in order to compare different input effects.

 

Have I understood you correctly? I hope that helps!

Cat

NicolasC
Fluorite | Level 6

Thanks very much for your detailed answer. The reason I wanted to transform my interval-scaled target was to render its distribution normal and be able to use a generalized linear model. Strangely, the STD node does not seem to perform as well as the log - when I plot the distribution after the STD node, it seems that it has only performed some sort of scaling (normalization) only. Many thanks. 

CatTruxillo
SAS Employee

RE: Standardization -- that's correct. Standardization does not change the shape of the distribution, only performs centering and scaling to change the location and scale. Different methods can be used, but the most common is to subtract the estimate of the mean (centering-- this gives the new variable a mean of 0) and then divide by the estimate of the standard deviation (scaling -- this gives the new variable a variance of 1). Skewness and kurtosis are not affected by this transformation. So, nonnormal data remains nonnormal.

 

If you want to use a generalized linear model, you do not need to manually transform the response- you can use the HPGLM node. You can specify the distribution and link function (the link function is where you would specify the log) in the properties panel.

 

What you have described, taking the log of a variable and modeling it as a linear regression, is lognormal regression, which is not, technically, a generalized linear model and requires additional back-transformation to interpret, and is subject to the bias I described above. Lognormal distribution is not in the same family of distributions as generalized linear models (exponential family). However, some procedures will allow you to do this by modeling a normal distribution with a log link function. But again, you don't have to do the transformation yourself if you use the HPGLM node.

I hope that helps!

Cat

 

 

NicolasC
Fluorite | Level 6

Thanks Cat for your reply, it did help! 

 

AnnaBrown
Community Manager

Hi NicolasC,

 

I'm glad Cat's responses were helpful! If one of the replies was the exact solution to your problem, can you "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.

Thanks!
Anna


Join us for SAS Community Trivia
SAS Bowl XXIX, The SAS Hackathon
Wednesday, March 8, 2023, at 10 AM ET | #SASBowl

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 3170 views
  • 1 like
  • 3 in conversation