About Susan

Susan · ‎06-10-2011

From your study description, it does not seem to me that a particular deer location is truly paired with a random location. Instead, it appears that you have a set of deer locations and a second set of random locations that you haphazardly paired together. Is that true? If your primary objective is to determine the effects of hunting pressure on habitat selection by deer, controlling for background habitat distribution, (distance to?) road, slope, and elevation, and if deer locations are not truly paired with random locations, then you might consider a multinomial model where the response is HABVALUE, rather than a binomial model where the response is deer/random and HABVALUE is an explanatory factor. If the random and deer locations are truly paired, then you need to build that pairing into your model. Pairing in logistic regression is not straightforward; in the literature, look for “conditional logistic regression” or “matched-pairs logistic regression” or “matched-set logistic regression” for discussions and examples. Possibly pertinent to your resource selection question is Duchesne et al 2010, J Applied Ecology 79:548-555 (and references within). Note that the two location types (deer and random) would be matched on some factors (e.g., same YEAR, TREAT, EXPOSURE, DN, DTID although the latter two don’t really make sense for random locations) but not on others (e.g., ROAD_LOG, ELEVATION, SLOPE, HABVALUE). In addition to the pairing issue, model specification also depends upon the answers to these questions. Were the deer marked so that you could identify individual deer? (I assume so, but please confirm.) How were they marked (e.g., GPS collars)? How many deer did you observe? About how many times was each observed and on what schedule? Does each deer stay within one treated area, or do deer use multiple treated areas? If they use multiple treated areas, does each deer use all three treated areas? For fixed effects factors you have: TREAT (hunting pressure with 3 levels); notably TREAT is not truly replicated—you’re using individual deer (ID) as replications of levels of TREAT rather than additional areas, and you’ll need to interpret the results of your study accordingly. Get TREAT out of the RANDOM statement, unless it's in an interaction with a random effects factor. EXPOSURE with 2 levels; I presume that you have information for both levels of this factor on each ID, although I also assume that you might have only one or the other for some deer—true? Thus, ID is not the experimental/observational unit associated with EXPOSURE; rather EXPOSURE is a repeated measurement on ID (think split-plot design). DN with 2 levels (diurnal and nocturnal); like EXPOSURE, you probably have information for both levels of this factor on each ID, again possibly missing one or the other for some deer. Again, think split-plot. YEAR with 2 levels; do you have different deer in different years, or the same deer in both years? YEAR could be a random factor, but keep in mind that you would then be attempting to estimate variance among years based on only two years; the quality of this estimate would be quite poor. In field studies, temporal variability is a given. The nature of year is usually problematic—it’s not random (because the levels of year are not a random sample from the population of years of interest, unless you have access to a time machine), and it’s not fixed because you’re in the field because you are (I presume?) in graduate school those years, and it can’t be truly replicated (unless you can work in parallel universes). I usually think of year as either fixed (unless I have data for a lot of years), or even do a separate analysis for each year to assess “repeatability”. ROAD_LOG, SLOPE, ELEVATION as continuous-scale factors, which may be correlated with HABVALUE levels This study is a “quasi-experiment” where "experimental" (explanatory) factors are not randomly assigned (or even able to be assigned at all) to experimental units. Consequently, you could have problems with data distribution: the factorial defined by your categorical fixed effects could be incomplete (meaning that some combinations of factors have no observations), you could have full- or quasi-separation problems with your binomial response (which would be my first guess at the reason for convergence failure), or you could just be spread too thin in some regions of the explanatory variable space for good estimation. Or your model specification may be wrong. As you note, it’s complex. I would start REALLY simple: Resolve the pairing issue, and decide whether to go the resource selection function route. Sort out a model without YEAR, EXPOSURE, DN, ROAD_LOG, ELEVATION terms, and SLOPE terms—I would just drop these terms from the model, rather than using them to specify a subset of the data as you’ve done with the BY statement—with a bare minimum RANDOM statement. Get that working, then build up. HTH, Susan

Susan · ‎06-07-2011

The "fails in outer iteration" message is one I've not seen before. "Estimated G matrix is not positive definite" means that one of the variance components has been set to zero. Some simulation work has shown that nominal Type I error rates are better preserved when this is avoided. There are various model modifications that may successfully accomplish that. However, the first thing is to be sure that your model specification (using MODEL and RANDOM statements) matches the design of your study. The questions that I posed in my email in the other thread were meant to clarify aspects of your design; without answers to those questions, it's not possible to make any suggestions about your model specification. Your new specification still has several puzzling components. The MODEL statement contains TREAT*EXPOSURE*HABVALUE, but lacks the three 2-way interactions; you must include all lower-order terms for a proper specification. This change alone will not fix your failed iteration problem. If TREAT, EXPOSURE, and HABITAT are observational, rather than experimental, factors, then it's possible that the TREAT*EXPOSURE*HABVALUE factorial is not complete for DN=Diurnal, meaning that some combinations of TREAT*EXPOSURE*HABVALUE have no data. (It's possible that the 3-way factorial is incomplete if they are experimental factors, but probably less likely.) The RANDOM statements may have redundancies (e.g., using both ID and ID(TREAT)) but not possible to say without more study design information. I doubt that TREAT is a random factor nested within YEAR, and you're still specifying TREAT as both a fixed factor and a random factor. You're attempting a complex model, and the best advice I can offer is to seek out a local statistician that you can consult with in person. Although it's possible if need be, it's difficult and inefficient to cover all the details electronically, and I often find that some problems are resolved only by trial and error (like the set-to-zero variance component issue). I hope this helps some. Susan With slightly more thought: Given that TREAT is in the MODEL statement and that YEAR is in the RANDOM statement, TREAT(YEAR) may resolve to TREAT*YEAR, which is potentially a valid term in the RANDOM statement. Whether it's appropriate for the study design, I can't say. Susan Message was edited by: Susan

Susan · ‎05-27-2011

You've obviously given thought to the construction of your model. It's possible that the model you would like on theoretical grounds is too optimistic--in other words, you might like it to do more than it might be able to. I agree with your suspicion: you may be getting a bit carried away with random-effects factors. Take a look at the Dimensions table, in particular the "Columns in Z" entry to get a sense of how big a task you've set for GLIMMIX. Apparently, you have repeated locations (DTID) on each deer. I imagine the number varies by individual deer; about how many are there for each deer? How many deer did you follow? Is there a random GPS location paired with each deer location? How is the random location "connected" to the deer location? Are the random and deer locations truly paired? EXPOSURE, TREAT and HABVALUE appear to be experimental or quasi-experimental factors. What is the design unit (for example, ID) with which each of these factors is associated or to which a level of each factor was (randomly) assigned? TREAT should not be in both MODEL and RANDOM statements. I presume that TREAT is a fixed-effects factor; if so, it should be omitted from the first RANDOM statement. RANDOM ID(TREAT) implies that a level of TREAT was assigned to each ID. Is that true? Often, but not necessarily, DTID as a repeated measures factor would be included in the MODEL statement. To be honest, I'm not sure what it means for DTID to be a continuous random effect (due to not being in MODEL) with an AR(1) covariance structure; perhaps someone else can weigh in on this point. I can imagine that you probably have a large number of unique DTID values. The third RANDOM statement probably is dramatically increasing the size of the Z matrix. Unless you have a lot of repeated measures on each deer, the quality of the estimates of these random effects may be very low. Although you would like to estimate them, in practice it may not be possible. You might try fitting a bare bones random structure for your model and then adding additional terms to see how far you can get. You can also compare the size of your X and Z matrices to those of your friend's model; yours may appear less complex but could actually be larger. Keep in mind that fitting a generalized (binary) linear mixed model is not the same as taking the normal-error version and replacing dist=normal with dist=binary, because the binary mean determines the binary variance whereas the normal mean and variance are separate estimates. This distinction impacts the specifications of the random factors. Good luck! Susan

Susan · ‎05-17-2011

Dale, if your offer is open to people other than the OP, I'm interested in trying your macro. GLIMMIX is quirky with the beta; it would be swell if NLMIXED was better behaved, and the 4-parameter option is appealing. Thank you, Susan susan.durham@usu.edu

Susan · ‎05-09-2011

You can reparameterize the model to easily get what you want. Let y be response. Let a be the categorical predictor with 3 levels. Let x be the continuous predictor. In this parameterization, the interaction a*x gives you a test of whether slopes are equal: proc glimmix data=your_data; class a; model y = a x a*x / solution; run; This parameterization reports the intercept and the slope estimates for the linear regression each level of a: proc glimmix data=your_data; class a; model y = a a*x / noint solution; /* Pairwise comparison among slopes, with stepdown Bonferroni adjustment */ estimate "Slope A1 versus A2" a*x 1 -1 0, "Slope A1 versus A3" a*x 1 0 -1, "Slope A2 versus A3" a*x 0 1 -1 / adjust=bon stepdown; run; Check the GLIMMIX documentation for details, including ADJUST= alternatives. I haven't tested this code so there could be syntax errors. The book by Milliken and Johnson (Analysis of Messy Data, Vol III: Analysis of Covariance) covers ANCOVA extensively. I think there's also info in Littell et al. (SAS System for Mixed Models, 2nd ed). Have fun! Susan In the second parameterization, the fixed-effects solutions will report tests of whether estimates are zero--in other words, whether each intercept or slope is equal to zero. Message was edited by: Susan

Susan · ‎04-28-2011

Another approach is to rescale the predictor variable. For example, if the variable is in units of grams, rescale to kilograms. Or use the UNITS statement, which accomplishes the same thing with more flexibility. HTH, Susan

Susan · ‎03-02-2011

You can use an analysis of covariance to test whether the slopes of the two lines are equal.

Susan · ‎01-12-2011

Hi Bhupinder, It's called an incomplete factorial design. "Unbalanced" generally means that sample sizes for different treatment combinations are unequal; "incomplete" means that sample sizes for some treatment combinations are zero. There is a literature out there about estimation of effects and contrasts for incomplete designs, but I think much of that literature is based on a carefully planned (a priori) pattern of incompleteness. In that sense, it's like fractional factorial design in that the actual design must be carefully chosen so that you are able to estimate the effects that are of the most interest to you. If you give SAS procedures like MIXED or GLIMMIX a factorial design for which the data are incomplete, it generally will return results for overall tests of A, B, Year and their interactions. You'll note that degrees of freedom will reflect the missing combinations. Certain lsmeans will be reported as non-estimable. Although people have brought me incomplete designs, I can't say that anyone has brought me one that was planned. So typically I suggest analyzing "slices" through the design space so that each slice is a complete factorial. For example, I might analyze the complete 2x2x2 for AxBxYear in one analysis and the 4x4 for AxB for just the first year, and then pull results together in interpretation. Not very elegant, but functional. HTH, Susan

Susan · ‎10-29-2010

Dale, thanks for the lovely detailed discussion. Susan

Susan · ‎10-29-2010

Perhaps it would be useful to explicitly think about this analysis as a regression. You are regressing SALES on VAR1, and you are assuming that the slopes of the regression for STORE_NUMBERs are essentially the same with no random variability. Consequently, the model estimates a single, fixed-effect slope parameter. I believe you also are interested in regressing SALES on VAR2, but you think that the slopes of the regression for STORE_NUMBERs are not constant and have some appreciable level of variability. Do you intend for your model to include this regression? If so, you must include VAR2 as a predictor in the MODEL statement. The presence of VAR2 in the RANDOM statement but not the MODEL statement fails to specify a regression. As you say, the intercept in the RANDOM statement will allow different STORE_NUMBERs to have different intercepts. But again I recommend that you remove the NOINT option from the MODEL statement. The models with and without the NOINT option are obviously different--they are not re-parameterizations of the same model. And as I showed in the example in a previous message, the results for the model without the NOINT option appear much more reliable. Given your model, an equation for "hand-calculating" the prediction for the jth observation on the ith STORE_NUMBER would be sales_ij = RandomIntercept_i + FixedEffectSlopeForVar1*Var1_ij + RandomVar2_i RandomVar2 is not multiplied by Var2 because your model fails to specify the regression of SALES on Var2. The reason the result from the model differs from that for your original hand-calculation equation is that your equation does not match your model. HTH, Susan

Susan · ‎10-28-2010

About 30 minutes after I posted my last message I thought, Wait a minute...maybe the presence or absence of NOINT *does* matter. So this morning, I experimented with NOINT using the random coefficients model example from the MIXED documentation. /* Example 56.5 in MIXED chapter */ data rc; input Batch Month @@; Monthc = Month; do i = 1 to 6; input Y @@; output; end; datalines; 1 0 101.2 103.3 103.3 102.1 104.4 102.4 1 1 98.8 99.4 99.7 99.5 . . 1 3 98.4 99.0 97.3 99.8 . . 1 6 101.5 100.2 101.7 102.7 . . 1 9 96.3 97.2 97.2 96.3 . . 1 12 97.3 97.9 96.8 97.7 97.7 96.7 2 0 102.6 102.7 102.4 102.1 102.9 102.6 2 1 99.1 99.0 99.9 100.6 . . 2 3 105.7 103.3 103.4 104.0 . . 2 6 101.3 101.5 100.9 101.4 . . 2 9 94.1 96.5 97.2 95.6 . . 2 12 93.1 92.8 95.4 92.2 92.2 93.0 3 0 105.1 103.9 106.1 104.1 103.7 104.6 3 1 102.2 102.0 100.8 99.8 . . 3 3 101.2 101.8 100.8 102.6 . . 3 6 101.1 102.0 100.1 100.2 . . 3 9 100.9 99.5 102.2 100.8 . . 3 12 97.8 98.3 96.9 98.4 96.9 96.5 ; run; /* Random coefficients model WITH fixed-effect intercept */ proc mixed data=rc; class Batch; model Y = Month / s; random Int Month / type=un sub=Batch s; run; /* Random coefficients model WITHOUT fixed-effect intercept */ proc mixed data=rc; class Batch; model Y = Month / s noint; random Int Month / type=un sub=Batch s; run; The results for intercept and slope estimates for BATCHs are different--not a lot for this example, but definitely different. The estimates of variances and the covariance are VERY different, and those for the model with NOINT do not look good. So I suggest that you add NOINT to your model statement. The use of Var1 in the MODEL statement but not in the RANDOM statement, and the use of Var2 in the RANDOM statement but not in the MODEL statement is a puzzle to me. I think (but do not know for sure about (3) below) that this model implies (1) that the slope for the linear regression of SALES on VAR1 is the same for all STORE_NUMBERs, (2) that SALES does not vary systematically with VAR2, and (3) that there is something like a random "block" effect induced by VAR2, which would tend to affect the intercept and not the slope. If you use this model, then you should know for sure what your model is doing. Is there a reason why you are not using a random coefficients model here, with model sales = var1 var2; random int var1 var2 / subject=store_number type=<>; ? Susan

Susan · ‎10-28-2010

I have some questions/comments about your code. 1. The MODEL statement specifies Var1, but the RANDOM statement specifies Var2. Do you mean to have Var2 rather than Var1 in the RANDOM statement? 2. Do you mean to have the NOINT option in the MODEL statement? Generally, I think you would want a fixed-effect intercept which would be the estimated mean of the random intercepts for Store_Numbers. ****Using NOINT is not wrong, it's just a different parameterization (which affects the "hand-calculation" you are attempting).**** I'm retracting this; see my next post. 3. Store_Number is not in a CLASS statement, which is acceptable as long as the dataset is sorted appropriately. If the model you use for "hand-calculation" with the parameter estimates matches the model you specify with the MIXED procedure, then you should get the same result (with perhaps a touch of rounding error). Susan Message was edited by: Susan ***I'm retracting this. See my next post. Message was edited by: Susan

Susan · ‎10-27-2010

The answer to your question depends on the specific model that you fit with MIXED. If you provide the code, some one might be able to suggest an answer. Depending on what you are trying to do, you may not need to calculate predictions yourself. Instead (more easily and with less risk of calculation error) you can obtain conditional (BLUP) predicted values using the OUTP option on the MODEL statement in the MIXED procedure. See the documentation for details. If you have SAS/STAT 9.22, you could use the PLM procedure to produce predictions for the observed data or for a new data set. For an introduction to PLM see http://support.sas.com/resources/papers/proceedings10/258-2010.pdf Hope this helps, Susan

Susan · ‎10-25-2010

It could be that you are in fact having numeric precision problems, as you suspect, given the large number of animals and what is probably a small number of vets. You could read up on numeric precision in SAS (you get lots of hits with Google), but it's even easier to just divide your animal count variables by 10,000 and see what happens. Let us know! Susan

Susan · ‎10-21-2010

Dale, It is rather convoluted, isn't it?! The more I learn about GLIMMIX the more I know I don't know. I'm pleased you found it useful. I certainly have benefited from your postings on SAS-L. Susan

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: WARNING: Pseudo-likelihood update fails in outer iteration 3

Re: WARNING: Pseudo-likelihood update fails in outer iteration 3

Re: Insufficient Memory

Re: Choosing Procedure for non-normal, correlated data

Re: pairwise comparison of interaction term in ancova (unequal slopes)

Re: Odds Ratio and CL all Equal 1

Re: Graph with two regression lines

Re: confusion in data naming

Re: Proc Mixed - Running the model and using equation gives different ...

Re: Proc Mixed - Running the model and using equation gives different ...

Re: WARNING: Pseudo-likelihood update fails in outer iteration 3

Re: WARNING: Pseudo-likelihood update fails in outer iteration 3

Re: Insufficient Memory

Re: Choosing Procedure for non-normal, correlated data

Re: pairwise comparison of interaction term in ancova (unequal slopes)

Re: Odds Ratio and CL all Equal 1

Re: Graph with two regression lines

Re: confusion in data naming

Re: Proc Mixed - Running the model and using equation gives different ...

Re: Proc Mixed - Running the model and using equation gives different ...

Re: Proc Mixed - Running the model and using equation gives different ...

Re: Proc Mixed - Running the model and using equation gives different ...

Re: Proc Mixed - Running the model and using equation gives different ...

Re: standard error equal to 0 using glimmix procedure

Re: Variance Testing Problem