BookmarkSubscribeRSS Feed
PSIOT2
Calcite | Level 5

Hi,

I am trying to calculate the Hosmer-Lemeshow p-value using proc logistic with lackfit option but the expected proportion given by SAS are not mine. How to put the expected proportion in the model? 

Please find my code:     

 

proc logistic data=noduleData;
model GT_finding(event='1')= report_score/ lackfit (DF=8 NGROUPS=13034 321 338 77 80 151 109 86 73 36 ) ;
run; 

 

Thank you

6 REPLIES 6
Ksharp
Super User

Could you explain your issue more clear ?

According to the Documentation:

Ksharp_0-1764840893485.png

 

So you could check the output of logistic model is right or not.

Code something like:

proc logistic data=sashelp.heart(obs=1000 where=(height is not missing));
model status(EVENT='Dead')=height /lackfit(DF=8 NGROUPS=10) ;
output out=want p=pred;  *save predicted Prob for dividing groups in Hosmer-Lemeshow test;
run;



/********Check the expected proportion is matched or not*************/
proc rank data=want out=want2 groups=10 ;
var pred;
ranks groups;
run;
proc sql;
select groups,count(*) as count,sum(Status = 'Dead' ) as Status_Dead_Obs,sum(pred) as Expected
from want2
 group by groups;
quit;

Here I take Status = 'Dead' as an example . and get the correct expected proportion.

Ksharp_1-1764841041604.pngKsharp_2-1764841260857.png

 

Here I think the most different thing is how to split these data into groups . Check the Doc ,see H-L how to group these data, once you know the details of H-L, I believe you would get the matched result.

PSIOT2
Calcite | Level 5

Thank for your answers.

I have 10 groups with proportion expected = 0.0003, 0.01, 0.02, 0.06, 0.18, 0.29, 0.39, 0.71, 0.89, 0.93.

The number of cases in each groups are different too: 13034, 321, 338, 77, 80, 151, 109, 86, 73, 36.

It seems that it is not possible to precise to enter the expected proportions in logistic model.

 

Ksharp
Super User
I think Rick is right.
According to Doc ,the group number is balance(a.k.a has almost the same number in each group), you can not get unbalance group .
Check the Doc again.
Rick_SAS
SAS Super FREQ

Where are the sizes 13034 321 338 77 80 151 109 86 73 36 coming from?   
The doc discusses how to get the observed and expected numbers for the groups in the H-L test.
Please run your model as

proc logistic data=noduleData;
model GT_finding(event='1')= report_score/ lackfit (DF=8 NGROUPS=10) ;
ods select  LackFitPartition;
run; 

and upload the LackFitPartition table for your data.

 

PSIOT2
Calcite | Level 5

Please find the LackFitPartition:

PSIOT2_0-1765198375899.png

 

Ksharp
Super User
You have extreme unbalance data:
367: 13938 =0.026
a.k.a the probability of event is too small.
Logistic model is unsuited(not right) for this data.
You need to oversample data to make this ratio is about 0.1, 0.2 ,0.3 .... and use PROC LOGISTIC again.
Or try other model like : Decision Tree, Random Forest .

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 312 views
  • 5 likes
  • 3 in conversation