Hello,
I have about 7 independent variables that have a non linear relationship with my 2 potential dependent variables (non linearity is expected and the loess curves I generated confirm the same). The real dependent variable is actually a 4 category ranking (it would be okay to use data for/from only 2 rankings if there is a need to make it binary, as the dataset is quite large). It is believed (and true for the most part) that the "4 category ranking" is affected by the "2 potential dependent variables" and hence they are potential dependent variables.
The goal of this analysis is to build a scoring system using the trends displayed by the independent variables. I used proc loess to look at trends and establish various cutoffs for a potential score. However, some of the independent variables interact and it will be nice to take that into account when determining the "score" that combines the 7 independent variables. This score will be used to predict the 2 or 4 category ranking on future data. I have not run non-parametric models before and I am trying to create a formula for the "score".
I tried proc gam for the multivariate model (dependent variable=2 categories of the 4 category ranking) but I am not sure if that is the best method. Please offer your ideas and suggestions. Thanks!
Never done it myself, but GAM does imply additivity amongst the independent variables, which kind of rules against interaction being easily included.
If you are trying to fit a single multinomial, you might try GLIMMIX, and use the EFFECT statement liberally to create splines. You can then interact those and maybe get something that works.
I guess it all boils down to the interaction(s). If you could come up with a good way to parameterize that for some points in your design maniflod, then you couldadd the interactions as additional independent variables. Then GAM seems like the logical approach, at least for the two category approach.
I don't often go here, but maybe TRANSREG, with all of its possible transformations, including semi-parametric, might be a possiblity.
Steve Denham
Thanks, Steve, for the ideas.
When I did Loess on the individual variables, the inflection points on the Loess curves were noted and used to create cutoffs for these individual variables, also attempting to make them linear and then the scores were multiplied to use as a score that would predict the binary response. I was worried that this method would lose some valuable information and hence wanted to try using these in their original form. You have a good point about the additivity and I will try to see if I can use only clearly additive variables in GAM. I haven't used Glimmix, so I will also try that and Transreg if time permits. Thanks again for your suggestions
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.