Programming the statistical procedures from SAS

A variable not consistently significant over tests

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 131
Accepted Solution

A variable not consistently significant over tests

Hi,

I used the boostrapping method to create 16 random samples. For each sample, a logistic regression analysis was performed. There is a variable named occupation which did not show consistent significance. For the 16 trials, it failed to show significance 4 times (based on the 1% significance level). The sample size of the data is 60245. I am wondering if I should include this variable in the final model. Can you please give some advice on this question? Smiley Happy

Thanks in advance.


Accepted Solutions
Solution
‎07-11-2011 02:52 PM
Trusted Advisor
Posts: 2,114

A variable not consistently significant over tests

That's one of the reasons that we do Bootstraps...to see which variables are consistently in or out of a model.

As to whether you need to include occupation in the end, that is more subtle than just looking at statistical significance.  Maybe it is needed for "face validity," etc. 

One of the things that you may want to look at is how is Occupation coded.  If it is a class variable with lots of levels, then you may just have too many level with low frequencies and lumping the codes would provide you with more information and a more stable model.

It could also be that you have an interaction term that you have not accounted for (e.g. is education also in the model?).

Doc Muhlbaier

Duke

View solution in original post


All Replies
Solution
‎07-11-2011 02:52 PM
Trusted Advisor
Posts: 2,114

A variable not consistently significant over tests

That's one of the reasons that we do Bootstraps...to see which variables are consistently in or out of a model.

As to whether you need to include occupation in the end, that is more subtle than just looking at statistical significance.  Maybe it is needed for "face validity," etc. 

One of the things that you may want to look at is how is Occupation coded.  If it is a class variable with lots of levels, then you may just have too many level with low frequencies and lumping the codes would provide you with more information and a more stable model.

It could also be that you have an interaction term that you have not accounted for (e.g. is education also in the model?).

Doc Muhlbaier

Duke

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 114 views
  • 0 likes
  • 2 in conversation