BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kaliv-1776
Calcite | Level 5
I'm new to SAS and am trying to recreate a project I made in JMP using SAS. However, I am getting different Chi-Square results that I can not figure out. I'm using HPLogistic, congra technique with a forward selection method. It's fairly straight forward. When reviewing the results, the selection details give me a Chi-Square of 5.9197 whereas JMP states 4.6225. The Chi-Square, also known as the G statistic should be the difference in the -2LL between the full and reduced model. SAS provides the correct numbers for the -2LL which doing the math would net 4.6225. However, the Chi-Square result in the Selection Details states 5.9197. Is there an option I need to invoke to change how SAS calculated the Chi-Square to be in line with JMP? I'm thinking that they are using different DF.
1 ACCEPTED SOLUTION

Accepted Solutions
StatDave
SAS Super FREQ

Yes, HPLOGISTIC (and HPGENSELECT), like PROC LOGISTIC, use the score statistic because doing likelihood ratio tests requires refitting each model which can become very costly (time-consuming) in larger problems, and as you note, the score, Wald, and likelihood ratio tests are asymptotically the same.

View solution in original post

8 REPLIES 8
thomp7050
Pyrite | Level 9

It is possible.  What is the proper calculation +/- 1 DF?

thomp7050
Pyrite | Level 9

Also, can you take the variables that belong in the best-fit model, execute the process using only those variables in your model using proc logistic and review the results?  Does it give you different point estimates?

Kaliv-1776
Calcite | Level 5
Thanks for the idea. I used proc logistic with the only the first variable that would be selected for inclusion into the model and the LLR states 4.6225. Underneath that is Score stating 5.9197...that that appears to be the value that HPLogistic is putting out called Chi-Square. I'm not familiar with Score, and would prefer HPLogistic to use LLR in the calculation. I say this because something is still being calculated wrong for the second variable to be included in the model. SAS is saying one variable is more significant when JMP's calculations show this to be incorrect.
Kaliv-1776
Calcite | Level 5

So upon some investigation... select=sl ... means to use the Score value. I change this to select=BIC and received the same result. The selection details still shows the Score output, but it appears that the method for selection did indeed change. Sharing the same results shouldn't be much of a shock seeing how Wald, Score, and LLR should all give approximately the same values.

 

There issue with changing select from SL is that it appears I lose control over changing the value from which I want to enter variables based on p-value.  This is an option I would like to retain.  I would feel more comfortable if the calculations were done by LLR rather than Score.  If this is an option, please show me where in the code this can be changed.

 

Secondly, the results for the second variable to be included is not what I would expect.  The first variable is X30, expected.  The second variable that SAS wants to include is X16.  However, when I choose X16 for inclusion in JMP, the LLR p-value is 0.0931.  The value I expect to be added is X42, which JMP shows has a p-value of 0.0442.  Based on these values, X42 provide greater explainitory power to the model than X16 would.  SAS on the other hand provides the following Score values: X16=0.0584 and X42=0.1627.  I understand why SAS would choose X16 over X42, but LLR should be more accurate than Score.

ballardw
Super User

As a minimum I would suggest posting the code used for both processes that do the chi-squares. Some one with experience in both JMP and SAS HP procedures may recognize either an option used/ left out/ and/or different defaults.

Kaliv-1776
Calcite | Level 5
Not sure how much this will help without the data file, but here are the two SAS codes. As I mentioned in an earlier post, it appears that Chi-Square value in hplogistic is from Score when I was expecting it to LLR. Proc Logistic shows both, which is how I figured that out. **** proc hplogistic data=work.Master_Imputed_Work_XLSX technique=congra ;
model Y=x1-x24 x26-x37 x39-x47 / link=logit;
where Region = 'A' and State = 1 and Year LT 2011 ;
selection method=forward (select=sl sle=0.25) details=all;
run; **** proc logistic data=work.Test;
model Y=x30 / link=logit;
run;
StatDave
SAS Super FREQ

Yes, HPLOGISTIC (and HPGENSELECT), like PROC LOGISTIC, use the score statistic because doing likelihood ratio tests requires refitting each model which can become very costly (time-consuming) in larger problems, and as you note, the score, Wald, and likelihood ratio tests are asymptotically the same.

Kaliv-1776
Calcite | Level 5

Thank you for the reply.  I was working through a known data set to familiarize myself with SAS and increase confidence in using an automated system.  I was frustrated to see SAS not providing the solution I expected.  My known data set has high correlation in it, so when one variable is selected over the other, the whole solution changes.  I've looked at it a bit more and the SAS solution appears to be acceptable...although it is different than how I manually recreated models in JMP.  Like you said, each model needed to be refitted which is very time consuming.

 

The data set only had 52 rows, the perhaps the sampel size isn't large enough for the tests to be similar.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1680 views
  • 0 likes
  • 4 in conversation