BookmarkSubscribeRSS Feed
JeremyGelb
Calcite | Level 5

Dear all,

 

I'm a student and I want to modelize migrations from individual datas. Because I have many municipals datas, I want to perform a multilevel analysis, with only the intercept as random effect. My variable to predict is multinomial (not ordinal) and has 3 categories  :

   0 : no migration (reference)

   1 : short migration (less than 40km)

   2 : long migration (more than 40km)

 

So I'm trying to use the proc GLIMMIX but all the parameters are confusing and I dind't find a exemple for multinomial datas.

  Could you help me to select the right syntax ?

 

By example for the empty model I use this syntax : 

 

proc glimmix data=Mob_06.Datas method=LAPLACE NOCLPRINT;
   class DCRAN Migration;
   model Migration (ref=first) = /CL link=glogit dist=MULTINOMIAL solution;
   RANDOM intercept/SUBJECT=DCRAN GROUP=Migration  TYPE=VC SOLUTION CL;
   COVTEST / WALD;
run;

Migration is the Y variable, DCRAN is the municipal code.

I'm not sure that Migration must be put in the class statment, but otherwise, the model fail with this error :

"Model is too large to be fit by PROC GLIMMIX in a reasonable amount of time on this
system. Consider changing your model"

 

thank you for your help

2 REPLIES 2
Damien_Mather
Lapis Lazuli | Level 10

my advice would be to use proc sql to generate a unique list of municipalities, then use surveyselect with method=srs to select a much smaller random sample of those, then proc sql again to do an inner join of the resuling municipality sample with your original data. Run your model on that sample. Keep taking smaller or larger samples until you find the tipping point for the error. The model then might then be your stopping point, or you can then allow  you to usefully investigate other approaches that give you equivalent results that are not so memory hungry.

 

SteveDenham
Jade | Level 19

First, I believe your multinomial response is ordinal. Consider that it will be generated by the following:

 

if distance_migrated = 0 then migration=0;

if 0<distance_migrated<=40 then migration=1;

if distance_migrated>40 then migration=2;

 

Consequently, you could then change the link from glogit to cumlogit, which would go a long ways towards reducing the model size and memory requirements.

 

But why categorize the response variable?  You will always lose some power by categorizing the response variable (see Frank Harrell's website for more on this  http://biostat.mc.vanderbilt.edu/wiki/Main/CatContinuous).

 

If you fit a continuous model (such as a spline) with an appropriate distribution, I believe your results will be more interpretable, more powerful and much more precise.

 

Steve Denham

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 6185 views
  • 3 likes
  • 3 in conversation