BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Kunko
Obsidian | Level 7

Hi there,

I am trying to calculate stabilised inverse probability weights to account for missing related to loss-to-follow up in my cohort study. I calculated this and it had a mean value of 1. Then I repeated the analysis using the calculated weight (ipw) as can be seen in the following proc genmod.

My question is that is the weight statement put correctly in proc genmod below? I am obating the output and the results seem meaningful, however, I want to hear from experts in SAS. Thank you in advance for your time and support. 


Proc genmod descending data =data;
class covariates;
model chol_status30=covariates /dist = binomial link=log lrci type3;
Covariate1/exp;

Covariate2/exp;

.

.

.
weight ipw;

run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
MichaelL_SAS
SAS Employee

I might suggest considering a few small changes to your analysis. One concern discussed in this SAS Note

is that: 

 

"In some modeling procedures such as PROC LOGISTIC and PROC GENMOD, arbitrarily inflating the values of a FREQ or WEIGHT variable (for instance, multiplying them by a constant) drives all effects toward significance."

 

That note mentions normalizing weights as one typical approach for avoiding this issue. What I might suggest in your case since these are inverse probability weights, is to add an ID variable (if one doesn't already exist) and refit the model with a REPEATED statement with the ID variable listed as the SUBJECT= effect. The REPEATED statement is typically used to request a GEE model fit for clustered data or repeated measures. The reason I recommend it here even though you only have one observation per subject is that by default for GEE models the procedure will report standard errors and confidence limits based on the empirical/robust/sandwich covariance matrix estimate.  This estimate is not influence by rescaling weights the same way the MLE based estimate from a generalized linear model would be. Moreover, for IPW models, the empirical covariance matrix is known to lead to conservative (but appropriate) inference if the weights are treated as fixed and known values (what PROC GENMOD would do) even when they were estimated from the data. For that reason, if decide to take this approach I'd also suggest replacing the LRCI option in the MODEL statement with the WALD option to request the use of the Wald statistic for the Type 3 test that is requested. 

 

 

View solution in original post

4 REPLIES 4
MichaelL_SAS
SAS Employee

I might suggest considering a few small changes to your analysis. One concern discussed in this SAS Note

is that: 

 

"In some modeling procedures such as PROC LOGISTIC and PROC GENMOD, arbitrarily inflating the values of a FREQ or WEIGHT variable (for instance, multiplying them by a constant) drives all effects toward significance."

 

That note mentions normalizing weights as one typical approach for avoiding this issue. What I might suggest in your case since these are inverse probability weights, is to add an ID variable (if one doesn't already exist) and refit the model with a REPEATED statement with the ID variable listed as the SUBJECT= effect. The REPEATED statement is typically used to request a GEE model fit for clustered data or repeated measures. The reason I recommend it here even though you only have one observation per subject is that by default for GEE models the procedure will report standard errors and confidence limits based on the empirical/robust/sandwich covariance matrix estimate.  This estimate is not influence by rescaling weights the same way the MLE based estimate from a generalized linear model would be. Moreover, for IPW models, the empirical covariance matrix is known to lead to conservative (but appropriate) inference if the weights are treated as fixed and known values (what PROC GENMOD would do) even when they were estimated from the data. For that reason, if decide to take this approach I'd also suggest replacing the LRCI option in the MODEL statement with the WALD option to request the use of the Wald statistic for the Type 3 test that is requested. 

 

 

Kunko
Obsidian | Level 7
Hi Michael,
Thank you so much for your suggestion and reasoning.
I checked both way using REPEATED statement with listing ID in Subject=effect as well as in CLASS statement.
I got very similar results with only small differences in some estimates and 95% CI.
With REPEATED statement:
RR=1.71 (95% CI=1.16, 2.53);
Without REPEATED statement:
RR=1.77 (95% CI=1.18, 2.65);
As can be observed using REPEATED statement gave more precise estimate and I wanted to use this in my paper. Would please let me know a citable article explaining this issue so that I will include in the paper to provide readers with the evidence.

Thank you so much again.
Tol
MichaelL_SAS
SAS Employee

A few references that touch on this point are provided at the end of the "Fitting Algorithm for Weighted GEE" section of the PROC GEE documentation (it supports weighted GEEs for missing data due to drop outs in longitudinal studies). I don't have the book with me now, but looking at the TOC for Fitzmaurice, Laird, and Ware 2011, I think section 18.5 might have the relevant discussion. 

jiweihe1223
Fluorite | Level 6

I recently learned that the weight statement in proc logistic actually uses frequency weight, meaning a weight of 2 is equivalent to 2 independent observations in the analysis. I found it very bizarre. A regular generalized linear model does not deal with weights this way. It would treat the weight of 2 as a weight of 2 in the estimating equation instead of 2 independent observations. The glm() in R does so. When you use repeated statement with id variable in proc genmod, you would nullify this independent observation thing, and this makes a difference. 

I wonder in which scenario the frequency weight is useful for making inference. 

Could you clarify?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1113 views
  • 2 likes
  • 3 in conversation