Solved: HPLogistic Chi-Square Calculation

Kaliv-1776 · Posted 04-21-2017 02:53 PM

I'm new to SAS and am trying to recreate a project I made in JMP using SAS. However, I am getting different Chi-Square results that I can not figure out. I'm using HPLogistic, congra technique with a forward selection method. It's fairly straight forward. When reviewing the results, the selection details give me a Chi-Square of 5.9197 whereas JMP states 4.6225. The Chi-Square, also known as the G statistic should be the difference in the -2LL between the full and reduced model. SAS provides the correct numbers for the -2LL which doing the math would net 4.6225. However, the Chi-Square result in the Selection Details states 5.9197. Is there an option I need to invoke to change how SAS calculated the Chi-Square to be in line with JMP? I'm thinking that they are using different DF.

StatDave · Posted 04-24-2017 11:09 AM

Yes, HPLOGISTIC (and HPGENSELECT), like PROC LOGISTIC, use the score statistic because doing likelihood ratio tests requires refitting each model which can become very costly (time-consuming) in larger problems, and as you note, the score, Wald, and likelihood ratio tests are asymptotically the same.

View solution in original post

thomp7050 · Posted 04-21-2017 03:22 PM

It is possible. What is the proper calculation +/- 1 DF?

thomp7050 · Posted 04-21-2017 03:28 PM

Also, can you take the variables that belong in the best-fit model, execute the process using only those variables in your model using proc logistic and review the results? Does it give you different point estimates?

Kaliv-1776 · Posted 04-21-2017 03:44 PM

Thanks for the idea. I used proc logistic with the only the first variable that would be selected for inclusion into the model and the LLR states 4.6225. Underneath that is Score stating 5.9197...that that appears to be the value that HPLogistic is putting out called Chi-Square. I'm not familiar with Score, and would prefer HPLogistic to use LLR in the calculation. I say this because something is still being calculated wrong for the second variable to be included in the model. SAS is saying one variable is more significant when JMP's calculations show this to be incorrect.

Kaliv-1776 · Posted 04-21-2017 04:36 PM

So upon some investigation... select=sl ... means to use the Score value. I change this to select=BIC and received the same result. The selection details still shows the Score output, but it appears that the method for selection did indeed change. Sharing the same results shouldn't be much of a shock seeing how Wald, Score, and LLR should all give approximately the same values.

There issue with changing select from SL is that it appears I lose control over changing the value from which I want to enter variables based on p-value. This is an option I would like to retain. I would feel more comfortable if the calculations were done by LLR rather than Score. If this is an option, please show me where in the code this can be changed.

Secondly, the results for the second variable to be included is not what I would expect. The first variable is X30, expected. The second variable that SAS wants to include is X16. However, when I choose X16 for inclusion in JMP, the LLR p-value is 0.0931. The value I expect to be added is X42, which JMP shows has a p-value of 0.0442. Based on these values, X42 provide greater explainitory power to the model than X16 would. SAS on the other hand provides the following Score values: X16=0.0584 and X42=0.1627. I understand why SAS would choose X16 over X42, but LLR should be more accurate than Score.

ballardw · Posted 04-21-2017 03:28 PM

As a minimum I would suggest posting the code used for both processes that do the chi-squares. Some one with experience in both JMP and SAS HP procedures may recognize either an option used/ left out/ and/or different defaults.

Kaliv-1776 · Posted 04-21-2017 03:48 PM

Not sure how much this will help without the data file, but here are the two SAS codes. As I mentioned in an earlier post, it appears that Chi-Square value in hplogistic is from Score when I was expecting it to LLR. Proc Logistic shows both, which is how I figured that out. **** proc hplogistic data=work.Master_Imputed_Work_XLSX technique=congra ;
model Y=x1-x24 x26-x37 x39-x47 / link=logit;
where Region = 'A' and State = 1 and Year LT 2011 ;
selection method=forward (select=sl sle=0.25) details=all;
run; **** proc logistic data=work.Test;
model Y=x30 / link=logit;
run;

StatDave · Posted 04-24-2017 11:09 AM

Yes, HPLOGISTIC (and HPGENSELECT), like PROC LOGISTIC, use the score statistic because doing likelihood ratio tests requires refitting each model which can become very costly (time-consuming) in larger problems, and as you note, the score, Wald, and likelihood ratio tests are asymptotically the same.

Kaliv-1776 · Posted 04-24-2017 12:56 PM

Thank you for the reply. I was working through a known data set to familiarize myself with SAS and increase confidence in using an automated system. I was frustrated to see SAS not providing the solution I expected. My known data set has high correlation in it, so when one variable is selected over the other, the whole solution changes. I've looked at it a bit more and the SAS solution appears to be acceptable...although it is different than how I manually recreated models in JMP. Like you said, each model needed to be refitted which is very time consuming.

The data set only had 52 rows, the perhaps the sampel size isn't large enough for the tests to be similar.

HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Re: HPLogistic Chi-Square Calculation

Registration is open