Hi
I am curious about the criteria used by Model Comparison to choose the best model. I have a interval-scaled target so I choose as a criteria the average squarred error between my different models (5 of them). I was wondering whether the time required to compute the different algorithms is taken into account by Model Comparison. If I have a model that has a average squared error slightly larger than another one but runs 10 times faster, will it still be selected as second?
Thanks
If it has the smallest ASE, it is chosen as the best, regardless of how long it took to fit the model.
This is usually fine because time it took to fit the model isn't the same as the time it takes to score the model. If you wanted to use a practicality factor to override the ASE selection, you might look at, for example, how complex the score code has to be (for example, did that model require imputing, transforming, standardizing, replacing values, using 10 times more variables while another one just took raw input variables), or whether it can be embedded in the database directly.
But, the simple answer to the question is, if you ask the Model Comparison node to select the best based on ASE, it will select the model with the best ASE. If you want to select a different one for another reason, then direct your arrow right past the Model Comparison node and go directly from your preferred model to the Score node.
Good question -- I hope that helps!
Cat
If it has the smallest ASE, it is chosen as the best, regardless of how long it took to fit the model.
This is usually fine because time it took to fit the model isn't the same as the time it takes to score the model. If you wanted to use a practicality factor to override the ASE selection, you might look at, for example, how complex the score code has to be (for example, did that model require imputing, transforming, standardizing, replacing values, using 10 times more variables while another one just took raw input variables), or whether it can be embedded in the database directly.
But, the simple answer to the question is, if you ask the Model Comparison node to select the best based on ASE, it will select the model with the best ASE. If you want to select a different one for another reason, then direct your arrow right past the Model Comparison node and go directly from your preferred model to the Score node.
Good question -- I hope that helps!
Cat
Thanks very much for your answer. It does help. I was curious about it.
I am actually comparing 5 models, and for each of them, I have 2 different variables (interval scaled) transformation. Firt one is Log (inputs+target), the other one is std (inputs+target). So I have a total of 10 outputs I would like to compare. If my selection criteria is ASE, then the Model Comparison node will choose the lowest ASE and this calculation will be biased by the fact that one is calculated on STD_variables and the other one in LOG_variables. I suspect the "STD" calculated ASE would be lower and one of the model after this transformation will be picked, however this is not related to the fitting of the model per-se. Do you have any thoughts about that? Many thanks!!
If I follow your explanation correctly, you have tried two different transformations (either log or standardization) on both the inputs AND the response? In that case, the manual transformations of the target variable make it very tricky to compare those models directly because of the trouble with back-transformation to the original scale.
Going to discuss in terms of a regression model here, for simplicity's sake.
Taking the log stretches and squishes the distribution, regularizing it so that cases way out at the ends have less leverage, giving cases nearer the center of mass more influence on the slope estimate than they would have in the raw scale, using OLS estimates. The interpretation of the log-scale estimates is quite different from the interpretation of the raw or STD estimates, and requires back-transformation and preferably an adjustment for bias.I'd prefer to use a link function in a generalized linear model instead of logging the response, because then the predictions and estimates can be obtained in the original scale and are trustworthy for model comparisons.
The STD is a useful trick for comparing different parameter estimates in terms of SD units, when the predictors have different variances. If you apply a generalized linear model, you can still obtain standardized weights in order to compare different input effects.
Have I understood you correctly? I hope that helps!
Cat
Thanks very much for your detailed answer. The reason I wanted to transform my interval-scaled target was to render its distribution normal and be able to use a generalized linear model. Strangely, the STD node does not seem to perform as well as the log - when I plot the distribution after the STD node, it seems that it has only performed some sort of scaling (normalization) only. Many thanks.
RE: Standardization -- that's correct. Standardization does not change the shape of the distribution, only performs centering and scaling to change the location and scale. Different methods can be used, but the most common is to subtract the estimate of the mean (centering-- this gives the new variable a mean of 0) and then divide by the estimate of the standard deviation (scaling -- this gives the new variable a variance of 1). Skewness and kurtosis are not affected by this transformation. So, nonnormal data remains nonnormal.
If you want to use a generalized linear model, you do not need to manually transform the response- you can use the HPGLM node. You can specify the distribution and link function (the link function is where you would specify the log) in the properties panel.
What you have described, taking the log of a variable and modeling it as a linear regression, is lognormal regression, which is not, technically, a generalized linear model and requires additional back-transformation to interpret, and is subject to the bias I described above. Lognormal distribution is not in the same family of distributions as generalized linear models (exponential family). However, some procedures will allow you to do this by modeling a normal distribution with a log link function. But again, you don't have to do the transformation yourself if you use the HPGLM node.
I hope that helps!
Cat
Thanks Cat for your reply, it did help!
Hi NicolasC,
I'm glad Cat's responses were helpful! If one of the replies was the exact solution to your problem, can you "Accept it as a solution"? Or if one was particularly helpful, feel free to "Like" it. This will help other community members who may run into the same issue know what worked.
Thanks!
Anna
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.