Proc genmod - estimate at each visit

hetuvio · Posted 04-09-2013 06:22 AM

Dear all,

I have to perform a trt comparison by visit using a logistic regression model with repeated measures including factors for treatment and country/region as well as visit as repeated measure. In addition, the least-square means including 95% confidence intervals of both trt groups will be splotted versus time.

So, I have started to work on a modeling using proc genmod.

* USUBJID is the subject identifier.

* Dependent variable = RESPFN is coded 1 for responder and 0 for non-responder

* TRT01PN is coded 0 for PBO, 1 for Active

* AVISITN is the numeric version of AVISIT (BASELINE, WEEK 1, WEEK 2, ..., WEEK 24, ..., WEEK 52) - Note that: AVISITN doesn't reflect the real time. For example, BASELINE is coded 20, WEEK 1 is coded 30, WEEK 2 is coded 40, but: WEEK 4 is coded 50 (not 60). Should I re-create a new numeric variable containing the real time ?

The responder status is asessed at each visit for all patients.

* COUNTRY contains to country code. We can have up to 28 countries. It represents a lot of categories. We may not need this variable at the end - since it may have a lots of cats, which may cause issues in the modelling.

Could you please let me know what do you think of the code below ?

Also :

- Which type of working correlation matrix would you use?

- I'm not sure that the inclusion of AVISITN in the proc genmod is really good (I've seen some examples on SAS documentation with some visit variables, but the variable was not included in the proc genmod). But, if I do not include it in my model, I don't see how I could estimate OR at each visit?

ods output estimates=ESTIMtest ;

proc genmod data = efftab descending ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn trt01pn country / nointdist=bin link=logit ;

repeated subject=usubjid / withinsubject=avisitn ;

estimate "OR Active versus PBO at WEEK 1" trt01pn -1 1 avisitn 30 / exp ;

run ;

I'm working on SAS version 9.3. It would be great if you could provide me with your input on this statistical analysis! I have started to work on it since a while without managing to go very much further.

Thanks so much in advance for your help,

Best regards,

Violaine

SteveDenham · Posted 04-09-2013 07:28 AM

I would add one more term to the model statement: avistn*trt01pn. As it currently stands the model fits one repeated "trajectory" in avistn for both PBO and Active. I would think that one objective would be to compare the OR at various time points for the two treatments, and without the interaction, the difference between the two treatments will be identical at all time points. Then, the estimate statement looks like it is treating avisitn as a continuous variable, while it has been used as a class variable in the model. Thus, the estimate is going to be 30 times the first component of avisitn, rather than the value at avisitn=30 (Week 1). Finally, do you want specific estimates per country, or is it more of a random effect, adding only variablility? For now, I am assuming it is a fixed effect.

Other than that, I worry that using a GEE rather than a generalized linear mixed model will lead to estimates that are marginal over the time points rather than conditional on them (see Stroup (2013) Generalized Linear Mixed Models), where he points out the difference in estimates under the two models.

I suggest:

proc glimmix data = efftab descending ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn|trt01pn|country / noint dist=bin link=logit ;

random intercept avisitn/subject=usubjid type=csh; /*Data are unevenly spaced in time, and likely have differing variances at each time point due to the distribution*/

lsmestimate "OR Active versus PBO at WEEK 1" avisitn*trt01pn /*<This will depend on the number of visits to correctly compare. It will look like -1 followed by (avisitn - 1) zeros, and then 1 followed by (avisitn - 1) zeros. > */ / exp ;

run ;

Good luck.

Steve Denham

hetuvio · Posted 04-09-2013 08:19 AM

Dear Steve,

Thank you very much for your support. Please see further details/questions belows:

1) "Thus, the estimate is going to be 30 times the first component of avisitn, rather than the value at avisitn=30 (Week 1)."

--> I'm sorry, I don't get what you mean here.

2) "Finally, do you want specific estimates per country, or is it more of a random effect, adding only variablility? For now, I am assuming it is a fixed effect."

--> Actually, this is just an adjustment variable - we don't need to use it for our estimates.

3) random statement : why do you include the intercept while we "noint" is specified in the model statement? do you think that creating a numeric variable for visit representing the 'real time' could prevent us to use the random statement ?

I'll try to find the documentation you suggested: Stroup (2013) Generalized Linear Mixed Models

Thank you again,

Best regards,

Violaine

SteveDenham · Posted 04-09-2013 08:43 AM

1) Your estimate statement comes down to: the difference between the first level of trt01pn and the second level of trt01pn plus 30 times the first level of avisitn. Does that make things clearer? If not, check in the Shared Concepts and Topics part of the documentation for the ESTIMATE and LSMESTIMATE statements.

2) Then country is a random effect, for which we would like to remove the variability. The GLIMMIX code for this would look something like:

proc glimmix data = efftab descending ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn|trt01pn / noint dist=bin link=logit ;

random intercept/subject=country;

random intercept avisitn/subject=usubjid*country type=csh; /*Data are unevenly spaced in time, and likely have differing variances at each time point due to the distribution*/

lsmestimate "OR Active versus PBO at WEEK 1" avisitn*trt01pn /*<This will depend on the number of visits to correctly compare. It will look like -1 followed by (avisitn - 1) zeros, and then 1 followed by (avisitn - 1) zeros. > */ / exp ;

run ;

3) The use of the intercept term in the random statements doesn't have anything to do with the fixed effects parameterization. It is a technique designed to improve computational performance, by utilizing block diagonalization of the variance-covariance matrix.

As far as fitting the "real time" as an effect--are you willing to accept that the effect is strictly linear in time? If not, then comes the problem of how to model time adequately. That advantage of a categorical approach is that you do not assume any particular structure to the time dependence (although you DO assume various correlations through the type= option). It could be linear, polynomial, non-linear or non-describable.

Steve Denham

hetuvio · Posted 04-09-2013 10:19 AM

Dear Steve,

Yes, I got it now.

I'll make a try with your proposed code, and look further into the documentation of the proc glimmix.

Thank you very much,

Violaine

hetuvio · Posted 04-09-2013 11:49 AM

Dear Steve,

I have the below warning when running the following code :

(For now, I haven't included the lsmestimate statement as I want to solve the error below first)

NOTE: The model does not contain an intercept. Columns of X are scaled only and not centered.

ERROR: Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this system. Consider changing your model.

NOTE: The SAS System stopped processing this step because of errors.

proc glimmix data = efftab ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn trt01pn avisitn*trt01pn / noint dist=bin link=logit ;

random intercept / subject=country ;

random intercept avisitn / subject=usubjid*country type=csh ;

run ;

Thank you in advance for your advise,

Best regards,

Violaine

SteveDenham · Posted 04-09-2013 01:56 PM

How many subjects do you have? That has to be what is hurting this, as 50 levels of avisitn, 2 of trt01pn, and 100 of the interaction is NOT a big model. Try dropping the intercept from the second random statement as a start.

Steve Denham

hetuvio · Posted 04-10-2013 04:23 AM

I have 1445 subjects, 2 levels of TRT01PN, 18 levels of AVISITN and 28 levels of COUNTRY.

Number of Observations Read 162927

Number of Observations Used 162927

This is the information from the previous model I did:

Dimensions

----------------------------------------------

G-side Cov. Parameters 21

Columns in X 56

Columns in Z per Subject 3022

Subjects (Blocks in V) 28

Max Obs per Subject 17946

Fyi, the modeling is done on data from the pool of 5 studies. I will also have to do it on each individual studies, in which we may have less patients, less countries.

I have tried what you suggested and the same error occurs in the log :

proc glimmix data = efftab ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn trt01pn avisitn*trt01pn / noint dist=bin link=logit ;

random intercept / subject=country ;

random /*intercept*/ avisitn / subject=usubjid*country type=csh ;

run ;

Violaine

hetuvio · Posted 04-10-2013 04:27 AM

Sorry, I have just seen an issue in my code. I have to fix it first.

hetuvio · Posted 04-10-2013 05:19 AM

Actually, I missed to add a selection previsouly.

I've re-run with the original code you proposed, and also dropping the intercept in the 2nd random statement as you suggested.

Both ways lead to the same information in the log :

I have 546 subjects, 2 levels of TRT01PN, 18 levels of AVISITN, 25 levels of countries. And below are additional informations:

Thanks for your help,

Violaine

SteveDenham · Posted 04-10-2013 08:24 AM

So it comes down to not enough observations to fit the Z matrix. The heterogeneous variance type (CSH) is just not going to work. I would suggest recoding avisitn to the actual week, at this point, and then trying a spatial covariance structure appropriate to unevenly spaced time points.:

proc glimmix data = efftab ;

weekcont=week;

class usubjid trt01pn week country ;

model respfn = week trt01pn week*trt01pn / noint dist=bin link=logit ;

random intercept / subject=country ;

random week/ subject=usubjid*country type=sp(pow)(weekcont) ;

run ;

Steve Denham

hetuvio · Posted 04-10-2013 08:38 AM

I've just tried the new proc glimmix. Unfortunately, the error still occur in the log (insuficient memory). And this is the info I get in the SAS output:

Violaine

SteveDenham · Posted 04-10-2013 08:48 AM

I am missing something obvious today. For now, try ignoring the country effect, and let's see what is happening with the repeated measures. Later, after that is working, we can try incorporating country as a random effect, and not use the subject= option. First things first, though. What happens when you comment out the random intercept/subject=country line?

Steve Denham

hetuvio · Posted 04-10-2013 09:52 AM

I have tried the following codes :

proc glimmix data = efftab ;

class usubjid trt01pn avisitn country ;

model respfn = avisitn trt01pn avisitn*trt01pn / nointdist=bin link=logit ;

random intercept avisitn / subject=usubjid*country type=csh ;

run ;

And also:

proc glimmix data = efftab ;

weekcont=week;

class usubjid trt01pn week country ;

model respfn = week trt01pn week*trt01pn / noint dist=bin link=logit ;

random week/ subject=usubjid*country type=sp(pow)(weekcont) ;

run ;

And also:

proc glimmix data = efftab ;

class usubjid trt01pn avisitn ;

model respfn = avisitn trt01pn avisitn*trt01pn / noint dist=bin link=logit ;

random intercept avisitn / subject=usubjid type=csh ;

run ;

And also:

proc glimmix data = efftab ;

weekcont=week;

class usubjid trt01pn week ;

model respfn = week trt01pn week*trt01pn / noint dist=binlink=logit ;

random week/ subject=usubjid type=sp(pow)(weekcont) ;

run;

And also:

proc glimmix data = efftab ;

class usubjid trt01pn avisitn country;

model respfn = avisitn trt01pn avisitn*trt01pn country / nointdist=bin link=logit ;

random _residual_ / subject=usubjid type=csh;

run ;

The 5 codes above lead to the same results:

Fyi, when I remove the statement random, it converges. But this is not what we want.

If you have any other ideas for me to try, feel free to let me know.

Thank you!

SteveDenham · Posted 04-10-2013 10:16 AM

Actually, we have made progress. "Did not converge" is much better than "Insufficient memory."

First, try adding the nloptions command. For the code I first recommended, try:

proc glimmix data = efftab ;

weekcont=week;

class usubjid trt01pn week country ;

model respfn = week trt01pn week*trt01pn / noint dist=bin link=logit ;

random week/ subject=usubjid*country type=sp(pow)(weekcont) ;

nloptions maxiter=200;

run ;

Glimmix defaults to 20 iterations, and when the likelihood space is relatively flat, or the initial conditions aren't close to the maximum, you may need to increase the number. This should get you close (look at the iteration history and what the objective function is doing). You may need to relax the convergence criterion if the objective function has flattened out, but still no convergence.

Steve Denham

Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit

Re: Proc genmod - estimate at each visit