Re: how to adjust for clustered variable when doing multinomial regres...

Jie111 · Posted 08-17-2020 08:27 AM

I am using proc logistic to analyze the association between blood pressure and CVD (categorical outcomes: heart disease only, stroke only, heart and stroke).

I want to consider the cluster variable, like hospital (data from 3 hospitals). In proc genmod, it only supports the ordinal multinomial model but not categorical multinomial.

how to adjust for the clustered variable when doing multinomial regression with categorical dependent variables?

PaigeMiller · Posted 08-17-2020 08:31 AM

You could add the variable HOSPITAL to the model in PROC LOGISTIC.

--
Paige Miller

Jie111 · Posted 08-17-2020 08:38 AM

Thanks for the quick reply.

Sorry, but I did not describe the question correctly. if the clustered variable was family, and we have about 1000 families. it is not possible to adjust it in the logsitc model directly.

proc logistic data=data descend;
       class  blood_pressure_g(ref="2") / param=ref;
       model inc_multinorminal=blood_pressure_g/ link=glogit;
  run;



proc genmod data=data descend;
      class family blood_pressure_g(ref="2") / param=ref;
       model inc_multinorminal=blood_pressure_g/ dist=MULTINOMIAL link=cumlogit;
            repeated family=alt_pairid / corr=IND covb;
run;

PaigeMiller · Posted 08-17-2020 08:43 AM

I don't really have a good answer.

When you have a categorical variable with one thousand levels, I don't really know of any modeling technique that will do a good job here. There are two reasons for this: potentially small number of data points for most (all?) levels, and that you may run out of memory. Of course, you can still try it and see what results, but I am not optimistic.

--
Paige Miller

SteveDenham · Posted 08-17-2020 08:55 AM

To me, this sounds like an analysis where PROC GLIMMIX can be applied. It can model nominal multinomial responses, and by treating clustering variables as random effects, should be able to accomplish what you wish to do.

proc glimmix data=data descend method=laplace;
      class family blood_pressure_g(ref="2");
       model inc_multinorminal=blood_pressure_g/ dist=MULTINOMIAL link=glogit;
         RANDOM family/subject=alt_pairid ;
run;

For this, be sure to sort your dataset by alt_pairid (which I assume is numeric). If it is not numeric, then it should be added to the CLASS statement. I removed the global param=ref, as it is not supported in GLIMMIX, which uses GLM parameterization.

SteveDenham

Jie111 · Posted 08-17-2020 09:20 AM

Thanks for the kind reply.

I tried code like below. but ERROR: The SAS System stopped processing this step because of insufficient memory.

Maybe I have too many levels of the family...

proc glimmix data=data  method=laplace;
      class family blood_pressure_g(ref="2");
       model inc_multinorminal=blood_pressure_g/ dist=MULTINOMIAL link=glogit;
         RANDOM blood_pressure_g/subject=family group=inc_multinorminal;
run;

PaigeMiller · Posted 08-17-2020 09:43 AM

Yes, this is a major issue when trying to model a categorical variable with 1000 levels.

You could try to somehow cluster the families together, so instead of 1000 families, you have 25 clusters ... but I don't have any ideas off the top of my head how to do this.

--
Paige Miller

Jie111 · Posted 08-17-2020 09:51 AM

Thanks a lot, Paige.
I would try it.

SteveDenham · Posted 08-17-2020 01:16 PM

This line is causing a lot of the problem:

  RANDOM blood_pressure_g/subject=family group=inc_multinorminal;

The way this reads, blood pressure is both a fixed and a random effect, and the random effect has >1000 levels, which are further subdivided by the number of levels in inc_multinomial, which is your dependent variable. Unless I am missing something, you have clustering by family. You may have heterogeneity of variance by blood pressure group, so the possible RANDOM statements would be:

  RANDOM intercept/subject=family;

/*OR*/

  RANDOM intercept/subject=family group=blood_pressure_g;

The first (random intercept model) should not stress GLIMMIX - there is a single estimate, with as many BLUPs from that as you have subjects.

Please try that and see how the memory situation works.

SteveDenham

Jie111 · Posted 08-18-2020 03:59 AM

Hi SteveDenham, thanks for the help.

I changed the code as follows

proc glimmix data=data   method=laplace;
         class   family    blood_pressure_g(ref="2");
        model inc_multinorminal (ref='0')=blood_pressure_g/ dist=MULTINOMIAL link=glogit;
          RANDOM intercept/subject=family   group=blood_pressure_g;
 run;

Then SAS reminds me that

ERROR: Nominal models require that the response variable is a group effect on RANDOM statements.
You need to add 'GROUP=inc_multinorminal'.

so I changed the Group, as follows:

proc glimmix data=data   method=laplace;
         class   family    blood_pressure_g(ref="2");
        model inc_multinorminal (ref='0')=blood_pressure_g/ dist=MULTINOMIAL link=glogit;
          RANDOM intercept/subject=family   group=inc_multinorminal;
 run;

Still, ERROR: The SAS System stopped processing this step because of insufficient memory.

SteveDenham · Posted 08-18-2020 08:47 AM

Digging around, I came up with one possibility, and one question for the wider audience.

The possibility would be to bootstrap your results, by using several subsets of the subject variable family. You could use PROC SURVEYSELECT to sample with replacement from all families, then fit each of these subsets separately. The parameter estimates and standard errors for the full group could then be obtained by model averaging (either straightforward or using PROC MIANALYZE in a clever way). The key would be finding out what size the subsets need to be to avert the memory issue.

The question for the wider audience is this: It is not obvious to me that you must include group=<response_variable> in the RANDOM statement. The error message points out that it must be included Could someone point me to where this is covered in the documentation ( @StatDave , @Rick_SAS) ? Thanks to anyone that has info on this.

SteveDenham

StatDave · Posted 08-18-2020 09:34 AM

While PROC GENMOD does not support nominal multinomial logistic regression with clustered data, the newer PROC GEE does. You can specify DIST=MULT and LINK=GLOGIT and then use the REPEATED statement. For this and other types of logistic models that are available, see this note.

SteveDenham · Posted 08-18-2020 10:10 AM

This really looks like a great alternative, @StatDave .

The example here looks directly applicable to this analysis. Here is the code I would consider:

proc gee data=data descend;
      class family blood_pressure_g ;
       model inc_multinorminal=blood_pressure_g/ dist=MULTINOMIAL link=glogit;
            repeated subject=family / within=altpairid;;
run;

You may need to sort the data by family and altpairid to get this to work. Also, I don't know if altpairid is numeric so that it could be used in the within= option without including it in the CLASS statement. Also, it appears that PROC GEE only uses a GLM parameterization, and doesn't appear to support the ref= option in the CLASS statement, so interpretation will have to be made carefully.

SteveDenham

StatDave · Posted 08-18-2020 11:03 AM

For both GENMOD and GEE:
- The data do not need to be sorted by the SUBJECT= variable.
- Any variable in the SUBJECT= or WITHIN= option must be specified in the CLASS statement.

Jie111 · Posted 08-19-2020 02:32 AM

Thanks a lot for your help, @SteveDenham @StatDave .

It seems that proc gee could help to deal with the question.

Unfortunately, my SAS reminds me that procedure GEE not found.

I would try it and feedback here when the Proc GEE is available.😀

how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

Re: how to adjust for clustered variable when doing multinomial regression?

SAS Innovate 2025: Register Now