BookmarkSubscribeRSS Feed
olliegeorge
Calcite | Level 5

I have a macro for performing HLM and logistic regression and then outputting a results table with the odds ratios and 95% confidence intervals from both models. When I apply the macro and try to run it, it is not able to generate the upper 95% CI for the logistic regression (OR_lr95UCI). If I remove this line from the code, it generates a table with all other results but when I include this variable, the code does not run. I am calculating it the same way as the lower limit  (OR_lr95LCI), so I don't know what is causing the issue. I have copied the macro below, with the problem variable bolded. 

 

%MACRO HLM2(data=, outcome=, covariates=, data2=, covariates2=, b2index=);

/* Run PROC LOGISTIC, result is in the data set logistic_output */

proc logistic data=&DATA desc COVOUT OUTEST=logistic_output;
model &OUTCOME = &COVARIATES ;
run;

proc iml;

/****************************************************************
*** Hirearchical model using empirical-Bayes or semi-Bayes estimation. ***
****************************************************************/
start hm_reg(b,v,z,t2,max_iter,c_criter,b_hm,bcov_hm,b2);
/****************************************************************
** Input: b : 1st-stage parameter estimates
** v : 1st-stage covariance matrix estimates for b
** z : 2nd-stage design matrix (prior design matrix)
** t2 : prior variance for semi-Bayes estimation (vector),
** (0 if prior variance is to be estimated using
** emirical-Bayes estimation.)
** max_iter: maximum number of iteration (default is 20)
** c_criter: convergence criteria (default is 1e-3)
** Output: b_hm : posterier estimates
** bcov_hm : covariance matrix for b
** b2 : 2nd-stage parameter estimates
****************************************************************/

/*** Number of first-stage parameters.***/
np = nrow(Z);
ncolz = ncol(Z);
/*** Second-stage degrees of freedom. */
df2 = np-ncolz; Inp = I(np);
/*** Empirical-Bayes estimation, else semi-Bayes estimation. ***/
if (min(t2) <= 0) then t2pass = 0; else t2pass = 1;
/*** Maximum number of iterations, and convergence criterion for t2. ***/
if max_iter = 0 then _maxiter = 20; else _maxiter = max_iter;
if c_criter = 0 then _criter = 1e-3; else _criter = c_criter;

if t2pass = 0 then do;
print np[format=4.0] " first stage parameters were regressed on"
ncolz[format=4.0] " second stage covariate(s)",
" using empirical-Bayes estimation method (prior variance (t2) to be estimated).";
end;
else do;
print np[format=4.0] " first stage parameters were regressed on"
ncolz[format=4.0] " second stage covariate(s)",
" using semi-Bayes estimation with prior variance (t2) fixed at" t2[format=8.4];
end;

/*** Undertake second-stage regression: ***/
t2old = t2+1;
do iter = 1 to _maxiter while(abs(t2-t2old) > _criter);
t2old = t2;
/*** Weight matrix. ***/
w = inv(v+t2#Inp); wv = w*v; wz = w*z; ws = sum(w);
/*** Invert 2nd-stage information. ***/
vs = inv(t(Z)*w*Z);
/*** 2nd-stage coefficient estimates. ***/
b2 = vs*(t(wz)*b);
if t2pass=0 then do;
/*** Residual from 2nd-stage estimates. ***/
e = b - (Z*b2);
/*** Total residual sums of squares. ***/
rsst = t(e)*w*e;
/*** Residual mean square. ***/
rms = np*rsst/(df2*ws);
/*** Update t2 for empirical Bayes. ***/
t2 = max(rms-sum(wv)/ws,0);
end;
end;

/*** Calculate posterior expectations of 1st-stage parameters: ***/
/*** Weight for prior expectations of the 1st-stage parameters (curvature correction for EB.) ***/
if t2pass=0 then wvc = ((df2-2)/df2)*wv; else wvc = wv;
/*** Projection of b to prior mean. ***/
hatw = Z*vs*t(wz);
st2 = sqrt(t2*t(t2));
/*** Projection of b to posterior mean. ***/
hatp = wvc*hatw + w#st2 + (wv-wvc);
/*** Estimated posterior mean. ***/
b_hm = hatp*b;
/*** Estimated posterior covariance. ***/
bcov_hm = v - wvc*(Inp-hatw)*v;
/*** Approx df in hierarchical model. ***/
npa = trace(hatp);

if t2pass=0 then do;
/*** Add variance component due to estimation of the prior variance: ***/
wvce = wvc*(e#(sqrt((sum(wv)/ws + t2)/(vecdiag(v)+t2))));
bcov_hm = bcov_hm + (2/(df2-2))*wvce*t(wvce);

/*** Check for fitting problems: ***/
if iter >= _maxiter then print " Empirical-Bayes did not converge.";
else do; if t2=0 then print " Empirical-Bayes converged after" iter[format=4.0] " iterations,"
" but potential underdispersion: t2 set to zero.";
else print " Empirical-Bayes converged after" iter[format=4.0] " iterations,"
" with prior variance (t2) estimate:" t2[format=8.4];
end;
end;

do i=1 to np;
if bcov_hm[i,i]<0 then do;
print "Negative second-stage variance: set to zero.";
bcov_hm[i] = 0;
end;
end;
finish;
/****************************************************************/

/****************************************************************
*** Main procedure for the estimation. Check model setting for each run ***
****************************************************************/

/*************************
**** Ordinary Logistic Regression Model ***
*************************/
bnames = {INTERCEPT &COVARIATES}`;
/* Make use of logistic_output */
use logistic_output;
read all into lr_est;
b_lr = lr_est[1,1:(ncol(lr_est)-1)]`;
bcov_lr = lr_est[2:nrow(lr_est),1:(ncol(lr_est)-1)];

/*********************************
*** Output logistic result ***
*********************************/
SE_lr = sqrt(vecdiag(bcov_lr));
ward = (b_lr # b_lr) / (se_lr # se_lr);
P_value = 1 - probchi(ward,1);

print "Ordinary logistic regression estimates:";
print bnames b_lr[format=10.4] se_lr[format=10.4] p_value[format=8.4];

/*******************
*** HLM Model ***
*******************/
/*** Initialize 1st-stage parameters to be modelled in the hierarchical model ***/
b_lr0 = b_lr[&b2index];
v_lr0 = bcov_lr[&b2index,&b2index];
bnames_hm = bnames[&b2index];

/*** Specify 2nd-stage data and parameters to be modelled in the hierarchical model ***/
use &DATA2 var {&COVARIATES2};
read all into Z_mat;
bnames2 = {&COVARIATES2}`;
print bnames2 Z_mat;

Z_mat = repeat(1,nrow(b_lr0),1);
bnames2 = {INTERCEPT};

t2 = 0.5;
run hm_reg(b_lr0, v_lr0, Z_mat, t2, 0, 0, b_hm, bcov_hm, b_hm2);

/****************************
*** Output HLM result ***
****************************/
SE_hm = sqrt(vecdiag(bcov_hm));
ward = (b_hm # b_hm) / (SE_hm # SE_hm);
P_value = 1 - probchi(ward,1);
print "Hierarchical logistic regression estimates:";
print bnames_hm b_hm[format=10.4] SE_hm[format=10.4] P_value[format=8.4];

/*** OR estimates from both ordinary logistic model and hierarchical model ***/
OR_lr = exp(b_lr[&b2index]);
OR_lr95LCI = exp(b_lr[&b2index] - 1.96*SE_lr[&b2index]);
OR_lr95UCI = exp(b_lr[&b2index] + 1.96*SE_lr[&b2index]);
OR_hm = exp(b_hm);
OR_hm95LCI = exp(b_hm - 1.96*SE_hm);
OR_hm95UCI = exp(b_hm + 1.96*SE_hm);

/*** Print results: ***/
print "Second stage parameter estimates using hierarchical model.", bnames2 b_hm2[format=10.4];

print " Odds Ratios and 95% CIs",
" Conventional Model Hierarchical Model",
bnames_hm OR_lr[format=8.2] OR_lr95LCI[format=8.2] OR_lr95UCI[format=8.2]
OR_hm[format=8.2] OR_hm95LCI[format=8.2] OR_hm95UCI[format=8.2];

quit;
run;

%MEND;

4 REPLIES 4
sbxkoenk
SAS Super FREQ

Hello,

 

I have moved this topic thread to another board that is more appropriate for this query of yours.
Home > Analytics > SAS/IML
"SAS/IML software and matrix computations"-board

 

You say :
"... but when I include this variable, the code does not run."

Does it generate a WARNING or an ERROR in the LOG??

 

Also, when copying and pasting SAS-code, click the little "running man" squared icon in the header, ... you then get a pop-up window in which you can paste your code. That way, formatting and structure are preserved and font is mono-spaced.

 

Koen

Rick_SAS
SAS Super FREQ

Are you claiming that the statement
OR_lr95LCI = exp(b_lr[&b2index] - 1.96*SE_lr[&b2index]);
runs without error but the statement
OR_lr95UCI = exp(b_lr[&b2index] + 1.96*SE_lr[&b2index]);    /* same statement but with a PLUS sign */

gives an error?  If so, perhaps the standard error is so large the EXP function is overflowing.

 

It would be extremely helpful if you copy/paste the EXACT statements in the SAS log that show the error, along with 5 statements before and after the error to provide context.

 

Try this and report back:

lowerArg = b_lr[&b2index] - 1.96*SE_lr[&b2index];
upperArg = b_lr[&b2index] + 1.96*SE_lr[&b2index];
logbig = constant('logbig');   /* largest value that can be exponentiated */
print lowerArg upperArg logbig;

OR_lr95LCI = exp(lowerArg);
OR_lr95UCI = exp(upperArg);

 

olliegeorge
Calcite | Level 5

Hello, thanks for your response and advice! Yes, 

OR_lr95LCI = exp(b_lr[&b2index] - 1.96*SE_lr[&b2index]) runs without error but the statement but OR_lr95UCI = exp(b_lr[&b2index] + 1.96*SE_lr[&b2index]) gives an error. 

 

I ran your suggested code and have pasted the log contents below, showing the error statements. I have also attached a PDF of the results output. 

 

When I include OR_lr95UCI in the macro, there are no results for Odds Ratios and 95% CIs but if I remove OR_lr95UCI, the results include a table listing all  bnames_hm, OR_lr, OR_lr95LCI, OR_hm, OR_hm95LCI, OR_hm95UCI[format=8.2] further validating that the issue is with OR_lr95UCI .

 

 

 


4089 ods html body=test;
NOTE: Writing HTML Body file: TEST
4090 %HLM2(DATA = pyear.Pregev_Dataall, OUTCOME = neuro,
4091 COVARIATES = %NUMLIST(ResExp2-ResExp52)
4092 birthyear quin2 quin3 quin4 quin5 mraco mracw mage /* mage20_24mage25_29 mage30_34 mage35 female*/,
4093 DATA2 = pyear.HLM2rXLS,
4094 COVARIATES2 = _2_6_Dinitroaniline Amide Anilide Anthranilic_diamide Azole Chloroacetanilide Dicarboximide
4095 Halogenated_organic N_Methyl_Carbamate Organochlorine Organophosphorus Pyrazole Pyridine
4096 Pyrethroid Substituted_Benzene Triazine Unclassified Urea,
4097 B2INDEX = (2:52) );

NOTE: PROC LOGISTIC is modeling the probability that NEURO='1'.
NOTE: There were 37887 observations read from the data set PYEAR.PREGEV_DATAALL.
NOTE: The data set WORK.LOGISTIC_OUTPUT has 61 observations and 66 variables.
NOTE: Compressing data set WORK.LOGISTIC_OUTPUT increased size by 100.00 percent.
Compressed is 2 pages; un-compressed would require 1 pages.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 2.08 seconds
cpu time 0.39 seconds


NOTE: IML Ready
NOTE: Module HM_REG defined.
ERROR: (execution) Invalid argument to function.

count : number of occurrences is 9
operation : EXP at line 5817 column 18
operands : upperArg
upperArg 51 rows 1 col (numeric)

statement : ASSIGN at line 5817 column 2
ERROR: Matrix OR_lr95UCI has not been set to a value.

statement : PRINT at line 5819 column 25
NOTE: Exiting IML.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE IML used (Total process time):
real time 0.14 seconds
cpu time 0.04 seconds

4098 RUN;
4099 ods html close;

 

Rick_SAS
SAS Super FREQ

Yes, as I suspected, you are getting numerical overflows in EXP for several elements of the upper CL.

The CL for those indices are essentially (0, Infinity).

 

There are a few ways to handle this situation. One is to report the statistics on the log-scale and not back-transform the statistics and CLs.

Another is to trap the arguments that will result in domain errors and manually insert missing values for the upper CLs.

 

Hopefully, it is clear how to proceed. If not, study the following small example:

proc iml;
b_lr    = {0.5,  1,   0,  0.2, 0.8};
SE_lr   = {0.6, 80, 800,  2.1, 3};
lowerArg = b_lr - 1.96*SE_lr;
upperArg = b_lr + 1.96*SE_lr;

OR_lr95LCI = exp(lowerArg);
OR_lr95UCI = j(nrow(upperArg), 1, .);  /* use missing value for infinity */
idx = loc(upperArg < constant('logbig'));
if ncol(idx)>0 then 
   OR_lr95UCI[idx,] = exp(upperArg[idx,]);

print lowerArg upperArg OR_lr95LCI OR_lr95UCI;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 4 replies
  • 1306 views
  • 3 likes
  • 3 in conversation