Hi,
I used the boostrapping method to create 16 random samples. For each sample, a logistic regression analysis was performed. There is a variable named occupation which did not show consistent significance. For the 16 trials, it failed to show significance 4 times (based on the 1% significance level). The sample size of the data is 60245. I am wondering if I should include this variable in the final model. Can you please give some advice on this question?
Thanks in advance.
That's one of the reasons that we do Bootstraps...to see which variables are consistently in or out of a model.
As to whether you need to include occupation in the end, that is more subtle than just looking at statistical significance. Maybe it is needed for "face validity," etc.
One of the things that you may want to look at is how is Occupation coded. If it is a class variable with lots of levels, then you may just have too many level with low frequencies and lumping the codes would provide you with more information and a more stable model.
It could also be that you have an interaction term that you have not accounted for (e.g. is education also in the model?).
Doc Muhlbaier
Duke
That's one of the reasons that we do Bootstraps...to see which variables are consistently in or out of a model.
As to whether you need to include occupation in the end, that is more subtle than just looking at statistical significance. Maybe it is needed for "face validity," etc.
One of the things that you may want to look at is how is Occupation coded. If it is a class variable with lots of levels, then you may just have too many level with low frequencies and lumping the codes would provide you with more information and a more stable model.
It could also be that you have an interaction term that you have not accounted for (e.g. is education also in the model?).
Doc Muhlbaier
Duke
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.