BookmarkSubscribeRSS Feed
Jerem
Calcite | Level 5

Hi everybody!

I'm quite new with SAS procedures, so my question might be sounds easy for a lot of you but I couldn't find the answer so far....

Here is my problem, I'm trying to calculate the marginal effects at the means of my independent variables. For this research, it was indeed decided not to use odds ratio but the marginal effects.

I'm using the proc glimmix to integrate the multi-level dimension of my data. Here is the code I wrote:

proc glimmix data=candidates;

  class commune ;

  model  binouverture (event='1') = EurostatA EurostatC sizelog ENP

      / solution dist=binary link=logit ddfm=satterth ;

  random intercept / subject= commune solution;

run;

Based on this, what would be the easiest procedure to calculated the marginal effects?

Many thanks for your insights!

Jerem

18 REPLIES 18
SteveDenham
Jade | Level 19

The code you have gives the conditional effects.  I assume that there are multiple measurements for each level of the variable 'commune'.  If so, then the marginal values may be obtained by treating this as a repeated measures design using the following code:

proc glimmix data=candidates;

  class commune ;

  model  binouverture (event='1') = EurostatA EurostatC sizelog ENP

      / solution dist=binary link=logit ddfm=satterth ;

  random intercept / subject= commune solution residual; /* Converts to a repeated measures design */

run;

Do you wish to get marginal estimates for each level of commune?  If so, then I would suggest:

proc glimmix data=candidates;

  class commune ;

  model  binouverture (event='1') = EurostatA EurostatC sizelog ENP commune commune*EurostatA commune*EurostatC commune*sizelog commune*ENP

      / solution dist=binary link=logit ddfm=kenwardrogers ; /* Addresses individual commune levels, and adds the Kenward-Rogers adjustment for repeated measures designs */

  random intercept / subject= commune solution residual; /* Converts to a repeated measures design */

lsmeans commune/ilink ci; /* Outputs the estimates for each level of commune, on both the logit and original scale */

run;

I hope this is helpful.

Steve Denham

Jerem
Calcite | Level 5

Dear Steve,

Thanks very much for your quick response! It already helps although I'm not familiar enough with multi-level regression to be sure that I'm on the good tracks.

So, just to be sure, here is my intention:

-I've got 1012 observations (=1012 electoral lists). Those 1012 were present in 262 different 'communes'. In other words, in each municipality(=communes), you have a certain amount of electoral lists: sometimes only a couple, sometimes more. I'm seeking to use a logistic regression explaining whether or not these electoral lists include a certain kind of candidates (If yes=1; no=0). My independent variables are : EurostatA EurostatC sizelog ENP as written in the model.

Because lists are embedded in distinct communes. I want to include the multi-level structure of my data. For that purpose, I was told to use a fixed effects model.

Furthermore, to report my results, I'm asked to present the marginal effects of the independent variables on the dependent variable.

Is that possible with the solution you proposed?

Many thanks in advance!

Jeremy

SteveDenham
Jade | Level 19

So an individual record would look something like:

commune list binouverture eurostata eurostatc sizelog enp

The list variable ranges from one or two to some larger number, commune is indexed so that there are 262 different municipalities.

Now comes the question, with several possiblities:  Are the municipalities a sample that you wish to use to infer to an entire population where you know how many sampling units exist?  Are the municipalities a sample that represent a population that could be considered infinite in some sense (i.e., a population of all POSSIBLE units?  Are the municipalities the ONLY ones for which you wish to draw conclusions?

One of these is where your research question is directed, and each would be addressed by a different kind of analysis in SAS.  I await your answer.  If the second scenario is the most likely, then the model I presented earlier is appropriate for obtaining marginal estimates.  If either of the other scenarios applies, then the analysis should be changed.

Steve Denham

Jerem
Calcite | Level 5

Actually, the observation is the electoral list itself (I did not code individual candidates). So for each of 1012 lists, I coded the information for :

-commune (one of the 262 municipalities),

-Eurostat (the municipality is either eurostatA, eurostatB, eurostatC. Reference used is EurostatB),

-sizelog (log of the population of the municipality),

-enp (effective number of parties in the municipality).

-binouverture of the list (Yes/no),


In other words, for each list from the same municipality, the variables 'commune' (that identifies the municipality), 'EurostatA', EurostatC', 'ENP', 'sizelog' is always the same while the dependent variable 'binouverture' varies for each list.

Regarding the key questions you asked: the 262 municipalities is the population. Actually, we coded information for all the 1012 lists presented in all the 262 municipalities at the last local elections. I don't want to infer my results to other case studies but evaluate the impact of independent variables on 'binouverture' for that particular election. To some extent, it's an electoral report, not theory-building. Thus statistical significance is not important because my sample is my population of interest in that paper.


The goal is thus to assess the marginal effect of each of the independent variables  'Eurostata' 'Eurostatc' 'Sizelog' 'Enp'  on the dependent variable 'binouverture' but taking into account the fact that lists are presented in 262 municipalities.

Does it make clearer? Thanks for your help Steve!

Jeremy

SteveDenham
Jade | Level 19

Excellent answers.  OK, everything is a fixed effect, and you have multiple measurements on the 262 municipalities.

proc glimmix data=candidates;

  class commune ;

  model  binouverture (event='1') = EurostatA EurostatC sizelog ENP commune commune*EurostatA commune*EurostatC commune*sizelog commune*ENP

      / solution dist=binary link=logit ddfm=kenwardrogers ; /* Addresses individual commune levels, and adds the Kenward-Rogers adjustment for repeated measures designs */

  random intercept / subject= commune solution residual; /* Converts to a repeated measures design.  Even though this is a RANDOM statement, the residual option means treat the values within a commune as correlated.  Note also the inclusion of commune and commune by covariate interactions in the model statement. */

lsmeans commune/ilink ci; /* Outputs the estimates for each level of commune, on both the logit and original scale */

run;

Be careful about the interpretation of the solution.  The values under commune*eurostatA for instance are deviations from the slope estimated by the value under EurostatA, for each commune.  The standard reference level is the last level of commune, so these values are the differences in slopes between each commune and the reference.

The least squares means here are the expected marginal means, at the mean level of EurostatA, EurostatC, sizelog, and ENP.  You may want to estimate marginal means for specific values of these four continuous covariates.  For that, see the documentation of the LSMEANS statement and the AT= option.

Steve Denham

Jerem
Calcite | Level 5

I've got my answers regarding the model now, thanks!

I submitted the model but it is still charging. Is proc glimmix a time-consuming procedure?

By the way, for the line lsmeans commune/ilink ci SAS proposed to change ci by cl. Is that correct in your opinion?

I'll keep you updated!

Jeremy

Jerem
Calcite | Level 5

Actually, just got the answer in the meantime:  after the Iteration History it is mentioned "Did not converge"...

I've to mention that I used only 691 observations out of the 1012 (for reasons specific to the subject under investigation).

When I integrate all observations, then there is the results which 'stop' after the descriptions of the dimensions of the model.

Number of Observations Read1013
Number of Observations Used1012

     
10614
21398

   
R-side Cov. Parameters1
Columns in X2104
Columns in Z per Subject0
Subjects (Blocks in V)262
Max Obs per Subject14

SteveDenham
Jade | Level 19

Welcome to the fine tuning part of GLIMMIX.  It really is not much fun.  It involves the use of the NLOPTIONS statement.

If it looks like the objective function is closing in on some value, it may be that all you need to do is add:

nloptions maxiter=100;

to the code.  GLIMMIX has a default of 20 iterations, which is very often not enough for a complex binary model to reach convergence.  Try inserting this line, and see if results are obtained.  If not, we will need to fine tune the convergence criteria.

Good luck.

Steve Denham

Jerem
Calcite | Level 5

Well, at least I could handle GLIMMIX in the future thanks to your help 😉

The only thing that I do not understand is that it does not show the results. Here is the only result given by SAS.

Results of the The GLIMMIX Procedure

                                   Number of Observations Read        1013

                                   Number of Observations Used        1012

                                               Response Profile

                                     Ordered                        Total

                                       Value    binouverture    Frequency

                                           1    0                     614

                                           2    1                     398

                  The GLIMMIX procedure is modeling the probability that binouverture='1'.

                                                 Dimensions

                                     R-side Cov. Parameters           1

                                     Columns in X                  1315

                                     Columns in Z per Subject         0

                                     Subjects (Blocks in V)         262

                                     Max Obs per Subject             14

proc glimmix data=candidats_ouverture;

nloptions maxiter=100;

class commune ;

model  binouverture (event='1') = EurostatA EurostatC sizelog ENP commune commune*EurostatA commune*EurostatC commune*sizelog commune*ENP

      / solution dist=binary link=logit ddfm=kenwardroger  ; /* Addresses individual commune levels, and adds the Kenward-Rogers adjustment for repeated measures designs */

  random intercept / subject= commune solution residual; /* Converts to a repeated measures design.  Even though this is a RANDOM statement, the residual option means treat the values within a commune as correlated.  Note also the inclusion of commune and commune by covariate interactions in the model statement. */

lsmeans commune/ilink cl ; /* Outputs the estimates for each level of commune, on both the logit and original scale */

run;

SteveDenham
Jade | Level 19

If there is no Iteration History in the output, the log must have something indicating a problem.  What NOTEs and WARNINGs might be showing up?

If there is an Iteration History, could you please post it?

Steve Denham

Jerem
Calcite | Level 5

Yes, sorry Steve. There is indeed a note:

AVERTISSEMENT: Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed.

SteveDenham
Jade | Level 19

And so it goes...

With the full dataset, it can't get good starting values.  It looks like the reduced set (691 records) did something but didn't converge.  Maybe we can use the parameters from that as starting points.  There should be something in the output that looks like Parameter Values at the Last Iteration.  Making sure that the order is maintained, try adding a PARMS statement.  Check the documentation for its use, but it will look something like:

parms (value1) (value2); /* Where value1 and value2 are from the output parameter values at the last iteration */

If this doesn't help, then simplifying the model is the next option, or going to a conditional model (G side repeated effects rather than R side).  Estimates will be conditional (subject specific) instead of marginal (population averaged).  I hope that the PARMS statement will help.

Steve Denham 

1zmm
Quartz | Level 8

One problem could be that the number of columns in X [=fixed effects], 1315, exceeds the number of observations, 1012.  Reduce the number of fixed-effect parameters in your MODEL statement to be less than the number of observations.

SteveDenham
Jade | Level 19

I am afraid that is my fault.  To include the separate slopes models for communes, I suggested adding the interactions.  Since this is all about marginal effects, those interactions should be deleted.  Try instead:

proc glimmix data=candidates;

nloptions maxiter=100;

  class commune ;

  model  binouverture (event='1') = EurostatA EurostatC sizelog ENP commune

      / solution dist=binary link=logit ddfm=kenwardrogers ; /* Addresses individual commune levels, and adds the Kenward-Rogers adjustment for repeated measures designs.  This also applies the averaged effects of the continuous covariates across all communes */

  random intercept / subject= commune solution residual; /* Converts to a repeated measures design.  Even though this is a RANDOM statement, the residual option means treat the values within a commune as correlated.  */

lsmeans commune/ilink ci; /* Outputs the estimates for each level of commune, on both the logit and original scale */

run;

I hope this addresses the column issue, which should in turn address the "can't get started" problem.  All thanks should go to for pointing out the (what should have been) obvious source of the problem.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 4039 views
  • 0 likes
  • 3 in conversation