## Statistical programming, matrix languages, and more

Occasional Contributor
Posts: 5

Hi, I encounter a problem on Hosmer-Lemeshow goodness-of-fit test getting a warning massage as below.

I do not understand the macro running that well, so I copied whole macro here, very long though.

I would appreciate very much if you could help me.

Error message-----------------------------------------------------------------

... ...

MPRINT(EXP_OBS): *===========================================;
MPRINT(EXP_OBS): *Hosmer-Lemeshow goodness-of-fit test ;
MPRINT(EXP_OBS): *chi-square=sum{[(Oi-Ei)**2]/[Ei*(1-Ei/Ni)]};
MPRINT(EXP_OBS): *===========================================;
MPRINT(EXP_OBS): H_L[ci,]=ndecci*(observci-expectci)*(observci-expectci)/(expectci*(ndecci-expectci));
MPRINT(EXP_OBS): end;
WARNING: Division by zero, result set to missing value.

operation : / at line 2842 column 1
operands : _TEM1004, _TEM1006

_TEM1004 1 row 1 col (numeric)

.

_TEM1006 1 row 1 col (numeric)

0

statement : ASSIGN at line 2842 column 1
WARNING: Division by zero, result set to missing value.

... ...

--------------------------------------------------------------------------------

Code-------------------------------------------------------------------------

%MACRO EXP_OBS;

*=====================================================;
* Expected and observed count and Hosmer Lemeshow Test;
*=====================================================;
PROC PHREG DATA=&infile outest=betas noprint;
%IF %UPCASE(&Model.)=BASIC %THEN %DO;
MODEL &Dep_Time*&Dep_Cens(0)=&Basic_XVARS. /RL COVB;
%END;

%IF %UPCASE(&Model.)=EXT %THEN %DO;
MODEL &Dep_Time*&Dep_Cens(0)=&Extend_XVARS. /RL COVB;
%END;
ID &IDN.;
output OUT=observed SURVIVAL=SURVobs xbeta=xbetaobs;
RUN;

***Go into IML to compute ;
data foriml;
set observed ;
run;

proc means data=foriml n noprint;
var xbetaobs;
output out=ncount n=n;
run;

data foriml;
set foriml;
if _n_=1 then set ncount(keep=n);
run;

*Now sort by futime*descendingevent;
proc sort data=foriml;
by &Dep_Time. descending &Dep_Cens;
run;

proc iml;
use foriml;
read all var {&idn. xbetaobs &Dep_Time. &Dep_Cens survobs} into dat;
close foriml;
**Note that survobs is survival probability at observed follow-up time;

s0=dat[,5]##exp(-dat[,2]);**Survival probaility at observed follow-up time at zero level of covariates;

n = nrow(dat); *number of persons in dataset;
cnter = J(n,1,0);

do cntJ = 1 to n;
cnter[cntJ,] = cntJ;
end;

dat = dat || cnter;
dat2 = dat[,2];
dattmp = dat;
dat[rank(dat2),] = dattmp;

* Now concatenate a Decile variable ;
decile = J(n,1,0);
do cnti = 1 to n;
do dec = 1 to 10;
if round((dec - 1)*(n/10) + 1) <= cnti & cnti <= round(dec*(n/10))
then decile[cnti,] = dec;
end;
end;

dat = dat || decile;

* Now re-sort dat as it was;
dat6 = dat[,6];
dattmp = dat;
dat[rank(dat6),] = dattmp;

* Now drop the counter column;
dat = dat[,1] || dat[,2] || dat[,3] || dat[,4] || dat[,5] || dat[,7];
dat=dat||s0;

alldat=dat[,2:3]||s0; * alldat[,1] = xbetaobs unless replaced by xbetanew, alldat[,2] = &FUTIME;
tmp=loc(dat[,4]=1); *location of events;
evtdat=alldat[tmp,];*auc at all time points of events;
nevt=nrow(evtdat);*number of events;

*select max time of event <= &YEAR;
xx=0;zz=0;
do ii=1 to n;
if dat[ii,4]=1 & dat[ii,3] <= &YEAR then do;**events at time <=YEAR;
zz=dat[ii,3];
end;
if dat[ii,3]>&YEAR. & dat[ii,4]=1 & xx=0 then do;
maxfu=zz;*max futime <= &YEAR, since data is sorted by follow-up time;
xx=1;
end;
end;

tmpa=evtdat[,2];
obsmaxfu=loc(tmpa=maxfu);*events with max fu <=&YEAR -there could be more than one;
ntmp=ncol(obsmaxfu);
obsmaxfulast=obsmaxfu[,ntmp];**the coordinate of the last obs, in the given order of evtdat,
that is an event with fu <=&YEAR - in case there is more than one such;

*compute expected and observed number of events at each person痴 actual follow-up time and
prob of event within &YEAR (usually 10*365) by decile of risk;
expected=j(10,1,0);
observed=expected;
probevt=expected;
ndec=expected;
dec=expected;
h_l=expected;

do ci=1 to 10;
dec[ci,]=ci;
expectci= 0;observci=0;probevci=0;ndecci=0;
do ei=1 to nevt;
do fi=1 to n;
if dat[fi,3]>=evtdat[ei,2] & dat[fi,6]=ci then do;
if ei>1 then expectci=expectci + 1 - (evtdat[ei,3]/evtdat[ei - 1,3])**(exp(dat[fi,2]));
if ei=1 then do;
expectci=expectci + 1 - (evtdat[ei,3]/1)**(exp(dat[fi,2]));
observci=observci + dat[fi,4];
end;
end;

if ei=1 then do;
if dat[fi,6]=ci then do;
probevci=probevci + (1 - evtdat[obsmaxfulast,3]**exp(dat[fi,2]));
ndecci=ndecci+1;
end;
end;
end;
end;
expected[ci,]=expectci;
observed[ci,]=observci;
ndec[ci,]=ndecci;
probevt[ci,]=probevci/ndecci;

*===========================================;
*Hosmer-Lemeshow goodness-of-fit test ;
*chi-square=sum{[(Oi-Ei)**2]/[Ei*(1-Ei/Ni)]};
*===========================================;
H_L[ci,]=ndecci*(observci-expectci)*(observci-expectci)/(expectci*(ndecci-expectci));
end;

obsexp=dec||ndec||observed||expected||probevt||h_l;

obsexnam={"decile" "n" "observed" "expected" "probevt" "h_l"};

create obsexp from obsexp[colname=obsexnam];
append from obsexp;
close obsexp;
quit;

TITLE1 "=============================================================";
TITLE2 "*** OBSERVED AND EXPECTED EVENT COUNTs at actual f-U times FOR &Model. MODEL ***";
TITLE3 nd prob of event (probevt) at given time(YEAR)・
TITLE4 "=============================================================";
PROC PRINT DATA=obsexp Label NOOBS;
var decile n observed expected probevt;
sum observed expected;
RUN;

*===========================================;
*Hosmer-Lemeshow goodness-of-fit test ;
*chi-square=sum{[(Oi-Ei)**2]/[Ei*(1-Ei/Ni)]};
*===========================================;
ods exclude all;
PROC MEANS data=obsexp sum;
Var H_L;
ods output summary=HLTest;
RUN;
ODS exclude none;

Data HLTest;
SET HLTest;
RENAME H_L_SUM=Chi_Square;
p_value=1-probchi(H_L_SUM,8);
RUN;

TITLE1 "===============================================================";
TITLE2 "*** Hosmer-Lemeshow goodness-of-fit test FOR &Model. MODEL ***";
TITLE3 "===============================================================";
PROC PRINT Data=HLTest noobs; RUN;

*=========================================;
*Gronnesby and Borgan Tests(Wald) for model;
*=========================================;
data anal2;
set observed;
rename xbetaobs=xbeta;
run;

ODS exclude all;
proc univariate data=anal2;
var xbeta;
output out=dec_xbeta pctlpts=10 to 100 by 10 pctlpre=xbeta;
run;

data anal2;
SET anal2;
If _N_=1 then set dec_xbeta;
xbeta_dec1=1*(.Z<xbeta<xbeta10);
xbeta_dec2=1*(xbeta10<= xbeta <xbeta20);
xbeta_dec3=1*(xbeta20<= xbeta <xbeta30);
xbeta_dec4=1*(xbeta30<= xbeta <xbeta40);
xbeta_dec5=1*(xbeta40<= xbeta <xbeta50);
xbeta_dec6=1*(xbeta50<= xbeta <xbeta60);
xbeta_dec7=1*(xbeta60<= xbeta <xbeta70);
xbeta_dec8=1*(xbeta70<= xbeta <xbeta80);
xbeta_dec9=1*(xbeta80<= xbeta <xbeta90);
xbeta_dec10=1*(xbeta90<=xbeta);
RUN;

PROC SORT data=anal2; by &IDN; run;
PROC SORT data=&infile; by &IDN; run;

%Let xbeta_decs= xbeta_dec2 xbeta_dec3 xbeta_dec4 xbeta_dec5 xbeta_dec6
xbeta_dec7 xbeta_dec8 xbeta_dec9 xbeta_dec10;

Data infile_xbeta_dec;
merge &infile anal2(Keep=&idn &xbeta_decs);
by &idn;
run;

PROC PHREG DATA=INFILE_xbeta_dec ;
%IF %UPCASE(&Model.)=BASIC %THEN %DO;
MODEL &Dep_Time*&Dep_Cens(0)=&Basic_XVARS. &xbeta_decs.;
%END;

%IF %UPCASE(&Model.)=EXT %THEN %DO;
MODEL &Dep_Time*&Dep_Cens(0)=&Extend_XVARS. &xbeta_decs.;
%END;

ID &idn;
test1: Test xbeta_dec2, xbeta_dec3, xbeta_dec4, xbeta_dec5, xbeta_dec6,
xbeta_dec7, xbeta_dec8, xbeta_dec9, xbeta_dec10;
ods output TestStmts=GBChisq;
RUN;
ODS exclude none;

TITLE1 "==============================================================";
TITLE2 "*** Gronnesby and Borgan Tests(Wald) for &Model. Model ***";
TITLE3 "==============================================================";
Proc Format;
Value \$LableFmt "test1"="GBTest for &Model. Model";
RUN;

PROC PRINT data=GBChisq noobs;
var Label WaldChisq DF ProbChiSq;
Format Label \$LableFmt.;
RUN;

TITLE1 ; TITLE2 ; TITLE3 ; TITLE4 ; TITLE5 ; TITLE6 ;
%MEND;

------------------------------------------------------------------------------------------------

SAS Super FREQ
Posts: 4,240

Yes, the warning is a "division by zero" warning due to the statement

H_L[ci,]=ndecci*(observci-expectci)*(observci-expe​ctci)/(expectci*(ndecci-expectci));

In your data, there is probably a cell for which the observed and expected decile counts are equal.

Occasional Contributor
Posts: 5

But is there any way to slove this problem, like modifying a little bit the program, otherwise we can not perform this test for our prediction model.

SAS Super FREQ
Posts: 4,240

Where did this macro come from? Did you write it or did you download it from somewhere?

Occasional Contributor
Posts: 5

I copied the macro from the website below.

Thank you very much.

SAS Super FREQ
Posts: 4,240

Debugging someone's macro is usually very time-consuming, especially when you don't know what it does.

At the following link, the authors of that macro say that the macro has been superceeded by later macros:

Rather than try to modify an old (2005) macro, I recommend that you upgrade to the latest program  If you get the same error, contact the macro authors.  You can get the contact information of the lead author from the journal publication: http://onlinelibrary.wiley.com/enhanced/doi/10.1002/sim.4026

Occasional Contributor
Posts: 5

The same error still happened at the upgrated program, so I will try to contact macro authors as you suggested.

Thank you very much for spending your valuable time to guid and help on this problem.

Posts: 2,655

[ Edited ]

You could try programming a "floor trap" to catch this:

devci=(ndecci-expectci);

if abs(devci)<1e-6 then devci=sign(ndecci-expectci)*1e-6;

Place this code before the first time the deviation is used and be sure to replace all examples of (ndecci - expectci) with devci, and see if that helps.  You may want to crank this down to 1e-8 if the values at the extreme deciles are quite small.

Steve Denham

Occasional Contributor
Posts: 5

Thank you for your kind help.

Can I ask tow more questiongs before running the "floor trap" by placing the code just before the deviation function as below.

*===========================================;
*Hosmer-Lemeshow goodness-of-fit test ;
*chi-square=sum{[(Oi-Ei)**2]/[Ei*(1-Ei/Ni)]};
--------------------------------------------------------------------

devci=(ndecci-expectci);

if abs(devci)<1e-8 then devci=sign(ndecci-expectci)*1e-8;

if abs(expectci))<1e-8 then expectci)=sign(expectci)*1e-8;
H_L[ci,]=ndecci*(observci-expectci)*(observci-expectci)/(expectci*(devcindecci-expectci));
end;

*===========================================;

1. I am not that clear about "replace all examples of (ndecci - expectci) with devci", dose this mean replace all (ndecci - expectci) in the macro into devci?

2.  If (ndecci - expectci) or expectci in our data equals to reall zero, the "floor trap" will not work even cranking down 1e-6 gradually to 1e-8,  1e-10, 1e-12 ?

Thank you.

Posts: 2,655

1.  The problem most likely lies in what @Rick_SAS said earlier--that you are dividing by zero.  The floor trap tests the difference that could possibly be zero, and if it is smaller than the tested value, sets it at that value so that division by zero is avoided.  The only problem is that it should be applied consistently whenever the difference is calculated--OOPS, maybe not.  Just thought of a situation where this could lead to a horrible problem.  I retract whatever I said about replacing throughout the code.  So instead, try:

devci=(ndecci-expectci);

if abs(devci)<1e-8 then devci=sign(ndecci-expectci)*1e-8;

In this case, the variable devci should ONLY be used in a denominator.  Putting it into a numerator as well would push the result away from zero, which is not what you want.

2.  If the difference is truly zero, then all that this should accomplish is avoiding the divide by zero error.  You could choose any non-zero value for devci, and it should work, so long as the difference everywhere but in a denominator remains defined as ndecci-expectci.

Steve Denham

SAS Super FREQ
Posts: 4,240

[ Edited ]

@SteveDenham 's reply inspired me to write a blog post about ways that I trap numerical overflows and cap extreme values when I plot data&colon;  "Trap and cap: Avoid division-by-zero and domain errors when evaluating functions"

SAS Super FREQ
Posts: 4,240