Hi, I'm analyzing a complex, hierarchial dataset examining the habitat selection of animals. I'm using an analysis procedure that examines habitat selection by generating random GPS locations and pairing them with the actual animal location to model the probability of an animal using a resource. To start, I developed quadratic terms because animals often avoid the lowest and highest values associated with a given landscape feature. When modeling higher-order terms (i.e., quadratic) it is necessary to also include lower-order terms in the model. In the case of modeling a quadratic polynomial, the lower-order (linear) term represents the overall effect of the covariate; without including the linear term the covariate effect will be depicted as a monotonically increasing or decreasing parabola with minimum or maximum values at the origin (Darlington 1990). I also natural log-transformed road distance to allow for a decreasing magnitude of influence with increasing distance (i.e., non-linear association). To assure that a natural log transformation was not attempted on a cell with a value = 0, I added 0.1 to all original values (new = log(original + 0.1)).
I ran a simple analysis this morning examining the slope and slope_quad (original value*original value) , elevation and elevation_quad (original value*original value) , distance to nearest road and log_distance to nearest road to see which model fit the data best (AIC model selection). Upon completing this simple analysis, I used the lowest AIC models to build a full model (see below):
PROC GLIMMIX DATA=HABITAT;
CLASS ID YEAR EXPOSURE TREAT HABVALUE;
MODEL VALUE (EVENT = '1') = TREAT EXPOSURE HABVALUE ROAD_LOG ELEVATION ELEVATION_QUAD SLOPE SLOPE_QUAD TREAT*EXPOSURE*HABVALUE TREAT*EXPOSURE*ROAD_LOG TREAT*EXPOSURE*ELEVATION TREAT*EXPOSURE*ELEVATION_QUAD TREAT*EXPOSURE*SLOPE TREAT*EXPOSURE*SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID(TREAT) TREAT YEAR /TYPE=VC;
RANDOM DTID / TYPE = AR(1) SUBJECT=ID;
RANDOM ELEVATION SLOPE ROAD HABVALUE/ TYPE=VC SUBJECT=ID;
ID = Animal Identification (unique value)
YEAR = 2008 AND 2009
EXPOSURE: Initial and Prolong
TREAT: Control, Low, and High
HABVALUE: (1: Mixed forest/grassland; 2: Forest; 3: Grassland)
RANDOM ID(TREAT) TREAT YEAR /TYPE=VC;
/*MEANING SELECTION OF RESOURCES MADE BY A DEER ARE MORE SIMILAR OR CORRELATED WHEN EACH TREATMENT; TREATMENTS ARE SIMILAR FROM YEAR TO YEAR (ASSUMING THEY HAVE THE SAME INFLUENCE EVEN WHEN TREATMENTS WERE RANDOMLY ASSIGNED IN YEAR 2); YEARS ARE MORE SIMILAR THAN BETWEEN THE 2 YEARS*/
RANDOM TIME / TYPE = AR(1) SUBJECT=ID;
/*NEED TO HAVE A COLUMN THAT IS A CONTINUOUS VARIABLE THAT IS A DATE AND TIME INDICATOR (MERGE DATE AND TIME INTO 1 DATE/TIME STAMP, I CREATED THIS USING SAS); MODEL WITH AR(1); THIS WILL ACCOUNT FOR THE TEMPORAL AUTOCORRELATION IN THE DATASET FOR BOTH OBSERVED AND RANDOM LOCATIONS*/
RANDOM ELEVATION SLOPE DIST_ROAD HABITAT / TYPE=VC SUBJECT=ID;
/*THIS MODELS THE CORRELATION OF RESOURCE SELECTION WITHIN INDIVIDUALS - MEANING THE SELECTION OF ELEVATION BETWEEN INTERVALS IS CORRELATED WITHIN AN INDIVIDUAL; WITH TYPE=VC IT ALSO IS ASSUMING THAT THERE IS RANDOM SELECTION OF RESOURCES (IND. VARS.) AND THE IND. VARS. ARE NOT CORRELATED WITH OTHER IND. VARS.; STATED ANOTHER WAY - EACH ANIMAL HAS ITS OWN RELATIONSHIP WITH ELEVATION AND THESE RELATIONSHIPS ARE NORMALLY DISTRIBUTED AMONG ANIMAL; MODELING IT THIS WAY IS CLOSER TO ECOLOGICAL REALITY BECAUSE ANIMAL ARE A SAMPLE AND EACH ANIMAL IS USING A SAMPLE OF THE AVAILABLE ELEVATIONS*/
Unfortunately, when I run this model I continue to receive the following message:
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
ERROR: Integer overflow on computing amount of memory required.
NOTE: The SAS System stopped processing this step because of insufficient memory.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 17.03 seconds
cpu time 6.04 seconds
I tried running this model on another computer with more RAM but with no luck. I believe the RANDOM effects are causing the problem of insufficient memory and may need to be revised somehow. A friend of mine ran even more complex datasets than mine using a normal desktop computer but for some reason I can get the model to run correctly. Any thoughts on how to resolve my problem? Thank you very much!
You've obviously given thought to the construction of your model. It's possible that the model you would like on theoretical grounds is too optimistic--in other words, you might like it to do more than it might be able to.
I agree with your suspicion: you may be getting a bit carried away with random-effects factors. Take a look at the Dimensions table, in particular the "Columns in Z" entry to get a sense of how big a task you've set for GLIMMIX.
Apparently, you have repeated locations (DTID) on each deer. I imagine the number varies by individual deer; about how many are there for each deer? How many deer did you follow?
Is there a random GPS location paired with each deer location? How is the random location "connected" to the deer location? Are the random and deer locations truly paired?
EXPOSURE, TREAT and HABVALUE appear to be experimental or quasi-experimental factors. What is the design unit (for example, ID) with which each of these factors is associated or to which a level of each factor was (randomly) assigned?
TREAT should not be in both MODEL and RANDOM statements. I presume that TREAT is a fixed-effects factor; if so, it should be omitted from the first RANDOM statement.
RANDOM ID(TREAT) implies that a level of TREAT was assigned to each ID. Is that true?
Often, but not necessarily, DTID as a repeated measures factor would be included in the MODEL statement. To be honest, I'm not sure what it means for DTID to be a continuous random effect (due to not being in MODEL) with an AR(1) covariance structure; perhaps someone else can weigh in on this point. I can imagine that you probably have a large number of unique DTID values.
The third RANDOM statement probably is dramatically increasing the size of the Z matrix. Unless you have a lot of repeated measures on each deer, the quality of the estimates of these random effects may be very low. Although you would like to estimate them, in practice it may not be possible.
You might try fitting a bare bones random structure for your model and then adding additional terms to see how far you can get. You can also compare the size of your X and Z matrices to those of your friend's model; yours may appear less complex but could actually be larger.
Keep in mind that fitting a generalized (binary) linear mixed model is not the same as taking the normal-error version and replacing dist=normal with dist=binary, because the binary mean determines the binary variance whereas the normal mean and variance are separate estimates. This distinction impacts the specifications of the random factors.