BookmarkSubscribeRSS Feed
Buck1480
Calcite | Level 5
Hi, I've been feverishly working to model habitat selection data using PROC GLIMMIX. My original model was as follows:

PROC GLIMMIX DATA=HABITAT;
CLASS ID YEAR EXPOSURE TREAT HABVALUE;
MODEL VALUE (EVENT = '1') = TREAT EXPOSURE HABVALUE ROAD_LOG ELEVATION ELEVATION_QUAD SLOPE SLOPE_QUAD TREAT*EXPOSURE*HABVALUE TREAT*EXPOSURE*ROAD_LOG TREAT*EXPOSURE*ELEVATION TREAT*EXPOSURE*ELEVATION_QUAD TREAT*EXPOSURE*SLOPE TREAT*EXPOSURE*SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID(TREAT) TREAT YEAR /TYPE=VC;
RANDOM DTID / TYPE = AR(1) SUBJECT=ID;
RANDOM ELEVATION SLOPE ROAD HABVALUE/ TYPE=VC SUBJECT=ID;
RUN;


Unfortunately, this model did not work (error: insufficient memory; read more about this issue on my previous posting

http://support.sas.com/forums/thread.jspa?threadID=14424&tstart=0

Susan commented on this model:

Re: Insufficient Memory
Posted: May 27, 2011 3:51 PM in response to: Buck1480 Reply


You've obviously given thought to the construction of your model. It's possible that the model you would like on theoretical grounds is too optimistic--in other words, you might like it to do more than it might be able to.

I agree with your suspicion: you may be getting a bit carried away with random-effects factors. Take a look at the Dimensions table, in particular the "Columns in Z" entry to get a sense of how big a task you've set for GLIMMIX.

Apparently, you have repeated locations (DTID) on each deer. I imagine the number varies by individual deer; about how many are there for each deer? How many deer did you follow?

Is there a random GPS location paired with each deer location? How is the random location "connected" to the deer location? Are the random and deer locations truly paired?

EXPOSURE, TREAT and HABVALUE appear to be experimental or quasi-experimental factors. What is the design unit (for example, ID) with which each of these factors is associated or to which a level of each factor was (randomly) assigned?

TREAT should not be in both MODEL and RANDOM statements. I presume that TREAT is a fixed-effects factor; if so, it should be omitted from the first RANDOM statement.

RANDOM ID(TREAT) implies that a level of TREAT was assigned to each ID. Is that true?

Often, but not necessarily, DTID as a repeated measures factor would be included in the MODEL statement. To be honest, I'm not sure what it means for DTID to be a continuous random effect (due to not being in MODEL) with an AR(1) covariance structure; perhaps someone else can weigh in on this point. I can imagine that you probably have a large number of unique DTID values.

The third RANDOM statement probably is dramatically increasing the size of the Z matrix. Unless you have a lot of repeated measures on each deer, the quality of the estimates of these random effects may be very low. Although you would like to estimate them, in practice it may not be possible.

You might try fitting a bare bones random structure for your model and then adding additional terms to see how far you can get. You can also compare the size of your X and Z matrices to those of your friend's model; yours may appear less complex but could actually be larger.

Keep in mind that fitting a generalized (binary) linear mixed model is not the same as taking the normal-error version and replacing dist=normal with dist=binary, because the binary mean determines the binary variance whereas the normal mean and variance are separate estimates. This distinction impacts the specifications of the random factors.

____________________________________________________________

Using some of Susan's suggestions, I modified the model to the following:

PROC GLIMMIX DATA=HABITAT; BY DN;
CLASS ID YEAR EXPOSURE TREAT HABVALUE;
MODEL VALUE (EVENT = '1') = EXPOSURE TREAT HABVALUE ROAD_LOG SLOPE SLOPE_QUAD ELEVATION ELEVATION_QUAD TREAT*EXPOSURE*HABVALUE TREAT*EXPOSURE*ROAD_LOG TREAT*EXPOSURE*ELEVATION TREAT*EXPOSURE*ELEVATION_QUAD TREAT*EXPOSURE*SLOPE TREAT*EXPOSURE*SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID ID(TREAT) TREAT(YEAR) YEAR / TYPE = VC;
RUN;


Unfortunately, I receive a different error message now:

NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
]WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
DN=Nocturnal
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 24.95 seconds
cpu time 16.96 seconds

Does anyone have any other suggestions to modify this model so I can effectively address my research questions? I'm not a season statistician so I'm unsure of what the "]WARNING: Pseudo-likelihood update fails in outer iteration 3" actually means and if there is a way to correct for this? In addition, what does "NOTE: Estimated G matrix is not positive definite" mean? How do you correct this problem? Thank you very much!
5 REPLIES 5
Susan
Calcite | Level 5
The "fails in outer iteration" message is one I've not seen before.

"Estimated G matrix is not positive definite" means that one of the variance components has been set to zero. Some simulation work has shown that nominal Type I error rates are better preserved when this is avoided. There are various model modifications that may successfully accomplish that.

However, the first thing is to be sure that your model specification (using MODEL and RANDOM statements) matches the design of your study. The questions that I posed in my email in the other thread were meant to clarify aspects of your design; without answers to those questions, it's not possible to make any suggestions about your model specification.

Your new specification still has several puzzling components. The MODEL statement contains TREAT*EXPOSURE*HABVALUE, but lacks the three 2-way interactions; you must include all lower-order terms for a proper specification. This change alone will not fix your failed iteration problem.

If TREAT, EXPOSURE, and HABITAT are observational, rather than experimental, factors, then it's possible that the TREAT*EXPOSURE*HABVALUE factorial is not complete for DN=Diurnal, meaning that some combinations of TREAT*EXPOSURE*HABVALUE have no data. (It's possible that the 3-way factorial is incomplete if they are experimental factors, but probably less likely.)

The RANDOM statements may have redundancies (e.g., using both ID and ID(TREAT)) but not possible to say without more study design information. I doubt that TREAT is a random factor nested within YEAR, and you're still specifying TREAT as both a fixed factor and a random factor.

You're attempting a complex model, and the best advice I can offer is to seek out a local statistician that you can consult with in person. Although it's possible if need be, it's difficult and inefficient to cover all the details electronically, and I often find that some problems are resolved only by trial and error (like the set-to-zero variance component issue).

I hope this helps some.

Susan With slightly more thought: Given that TREAT is in the MODEL statement and that YEAR is in the RANDOM statement, TREAT(YEAR) may resolve to TREAT*YEAR, which is potentially a valid term in the RANDOM statement. Whether it's appropriate for the study design, I can't say.

Susan



Message was edited by: Susan
Buck1480
Calcite | Level 5
Susan,

Sorry for the confusion about study description to help with the questions that I'm asking! My study was conducted on a 1,861 hectare property and was divided into 3 treatments of approximately equal size and vegetative composition. The three treats are control, no hunters; low density, 1 hunter/101 ha; and high density, 1 hunter/30 ha. I decided to evaluate exposure to hunting pressure (initial, days 1-3; prolonged, days 4-13) to assess differences in habitat selection and by treatment levels. These are the two categorical variables in the model. The other categorical variable is habitat value (1 = mixed, 2 = forest, and 3 = field). To conduct a logistic regression analysis using PROC GLIMMIX, I allocated random GPS locations on the study area and paired them with actual animal locations. I've been working with two biometricians and we're all stumped on the issues that I'm having in this analysis. We decided to take a more simplistic approach to get away from the three-way interactions using a "BY" statement for EXPOSURE, TREAT, AND DN to use more of an estimation approach since with a large daatset we're bound to find something statistically significant. See my new model below:

PROC GLIMMIX DATA=HABITAT; BY EXPOSURE TREAT DN;
CLASS ID TREAT YEAR HABVALUE;
MODEL VALUE (EVENT = '1') = HABVALUE ROAD_LOG ELEVATION ELEVATION_QUAD SLOPE SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID(TREAT) TREAT YEAR / TYPE=VC;
RANDOM ELEVATION SLOPE ROAD HABVALUE / TYPE=VC SUBJECT=ID;
RUN;

I have two years of data (YEAR), TREAT(control, low, and high), and ID(TREAT) selection of resources made by an animal are more similar or correlated within each treatment. Lastly, RANDOM ELEVATION SLOPE ROAD HABVALUE are added to models the correlation of resource selection within individuals, which means that selection of (e.g., elevation) between intervals is correlated within an individual.

I ran this more simplistic model this morning and here's the errors I received back:

NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 5.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Control DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Control DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=High DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 4.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=High DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Low DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 7.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Low DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 2.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Control DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 4.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Control DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 2.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=High DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 2.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=High DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Low DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Low DN=Nocturnal
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 49.51 seconds
cpu time 40.78 seconds


So I decided to remove the last RANDOM statment and used the following model:

PROC GLIMMIX DATA=HABITAT; BY EXPOSURE TREAT DN;
CLASS ID YEAR TREAT HABVALUE;
MODEL VALUE (EVENT = '1') = HABVALUE ROAD_LOG ELEVATION ELEVATION_QUAD SLOPE SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID ID(TREAT) TREAT YEAR / TYPE=VC;
RUN;


I received the message: "Estimated G Matrx is not positive definite" again and you mentioned that there are some ways to tweek the model to eliminate this error. Additionally, I only had one Warning: Pseudo-likelihood update fails in outer iteration 3. NOTE: Did not converge. See rest of LOG Output below:


NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Control DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Control DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
WARNING: Pseudo-likelihood update fails in outer iteration 3.
NOTE: Did not converge.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=High DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=High DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Low DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Initial TREAT=Low DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Control DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Control DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=High DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=High DN=Nocturnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Low DN=Diurnal
NOTE: The GLIMMIX procedure is modeling the probability that Value='1'.
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: Estimated G matrix is not positive definite.
NOTE: The above message was for the following BY group:
EXPOSURE=Prolong TREAT=Low DN=Nocturnal
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 14.18 seconds
cpu time 10.98 seconds


Based on the new SAS code using a more simplistic approach to get away from 3-way interactions, I think I'm getting closer. Overall, I want to estimate the probability of animals selecting different resources within each treatment. Additionally, I'm not missing any data in the entire dataset. Do you have any additional suggestions based on where I'm at now? Thank you very much for all of your input!
Buck1480
Calcite | Level 5
Sorry Susan, I forgot to post my first code that I ran this morning that gave me the first LOG Output! See below:

PROC GLIMMIX DATA=HABITAT; BY EXPOSURE TREAT DN;
CLASS ID TREAT YEAR HABVALUE;
MODEL VALUE (EVENT = '1') = HABVALUE ROAD_LOG ELEVATION ELEVATION_QUAD SLOPE SLOPE_QUAD / DIST=BINARY LINK=LOGIT SOLUTION;
RANDOM ID(TREAT) TREAT YEAR / TYPE=VC;
RANDOM ELEVATION SLOPE ROAD HABVALUE / TYPE=VC SUBJECT=ID;
RUN;
Susan
Calcite | Level 5
From your study description, it does not seem to me that a particular deer location is truly paired with a random location. Instead, it appears that you have a set of deer locations and a second set of random locations that you haphazardly paired together. Is that true?

If your primary objective is to determine the effects of hunting pressure on habitat selection by deer, controlling for background habitat distribution, (distance to?) road, slope, and elevation, and if deer locations are not truly paired with random locations, then you might consider a multinomial model where the response is HABVALUE, rather than a binomial model where the response is deer/random and HABVALUE is an explanatory factor.

If the random and deer locations are truly paired, then you need to build that pairing into your model. Pairing in logistic regression is not straightforward; in the literature, look for “conditional logistic regression” or “matched-pairs logistic regression” or “matched-set logistic regression” for discussions and examples. Possibly pertinent to your resource selection question is Duchesne et al 2010, J Applied Ecology 79:548-555 (and references within). Note that the two location types (deer and random) would be matched on some factors (e.g., same YEAR, TREAT, EXPOSURE, DN, DTID although the latter two don’t really make sense for random locations) but not on others (e.g., ROAD_LOG, ELEVATION, SLOPE, HABVALUE).

In addition to the pairing issue, model specification also depends upon the answers to these questions.

Were the deer marked so that you could identify individual deer? (I assume so, but please confirm.) How were they marked (e.g., GPS collars)?

How many deer did you observe? About how many times was each observed and on what schedule?

Does each deer stay within one treated area, or do deer use multiple treated areas? If they use multiple treated areas, does each deer use all three treated areas?

For fixed effects factors you have:

TREAT (hunting pressure with 3 levels); notably TREAT is not truly replicated—you’re using individual deer (ID) as replications of levels of TREAT rather than additional areas, and you’ll need to interpret the results of your study accordingly. Get TREAT out of the RANDOM statement, unless it's in an interaction with a random effects factor.

EXPOSURE with 2 levels; I presume that you have information for both levels of this factor on each ID, although I also assume that you might have only one or the other for some deer—true? Thus, ID is not the experimental/observational unit associated with EXPOSURE; rather EXPOSURE is a repeated measurement on ID (think split-plot design).

DN with 2 levels (diurnal and nocturnal); like EXPOSURE, you probably have information for both levels of this factor on each ID, again possibly missing one or the other for some deer. Again, think split-plot.

YEAR with 2 levels; do you have different deer in different years, or the same deer in both years? YEAR could be a random factor, but keep in mind that you would then be attempting to estimate variance among years based on only two years; the quality of this estimate would be quite poor. In field studies, temporal variability is a given. The nature of year is usually problematic—it’s not random (because the levels of year are not a random sample from the population of years of interest, unless you have access to a time machine), and it’s not fixed because you’re in the field because you are (I presume?) in graduate school those years, and it can’t be truly replicated (unless you can work in parallel universes). I usually think of year as either fixed (unless I have data for a lot of years), or even do a separate analysis for each year to assess “repeatability”.

ROAD_LOG, SLOPE, ELEVATION as continuous-scale factors, which may be correlated with HABVALUE levels

This study is a “quasi-experiment” where "experimental" (explanatory) factors are not randomly assigned (or even able to be assigned at all) to experimental units. Consequently, you could have problems with data distribution: the factorial defined by your categorical fixed effects could be incomplete (meaning that some combinations of factors have no observations), you could have full- or quasi-separation problems with your binomial response (which would be my first guess at the reason for convergence failure), or you could just be spread too thin in some regions of the explanatory variable space for good estimation. Or your model specification may be wrong.

As you note, it’s complex. I would start REALLY simple: Resolve the pairing issue, and decide whether to go the resource selection function route. Sort out a model without YEAR, EXPOSURE, DN, ROAD_LOG, ELEVATION terms, and SLOPE terms—I would just drop these terms from the model, rather than using them to specify a subset of the data as you’ve done with the BY statement—with a bare minimum RANDOM statement. Get that working, then build up.

HTH,
Susan
Buck1480
Calcite | Level 5
Thanks Susan! I greatly appreciate your input!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 6110 views
  • 0 likes
  • 2 in conversation