Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- PROC GLIMMIX Issue with Residuals

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-09-2011 09:17 AM

Hi, my name is Andy and I'm analyzing a large dataset using SAS Proc Glimmix

procedure. My dataset contains over 20,000 GPS records. I'm trying to

evaluate why certain deer were observed during hunting season thus I've coded

the deer that were observed with a "1" and those not observed with a "0." I

coded the entire our that the deer was observed to encompass any hunter

recording errors. My model is shown below:

PROC GLIMMIX DATA=OBS METHOD=LAPLACE;

CLASS ID YEAR EXPOSURE HABITAT_VALUE;

MODEL OBSERVED (EVENT = '1') = EXPOSURE STEPLENGTH HABITAT_VALUE ELEVATION

DIST_NEAREST_ROAD / DIST=BINARY LINK=LOGIT SOLUTION;

RANDOM ID YEAR;

RUN;

I want to see if the different independent variables influence the

observation of deer throughout the hunting season. My question is what are

the assumptions that I need to adhere to with logistic regression. I read

that the data does not need to be normally distributed. I know "steplength"

is extremely right skewed with the mean of 48 meters and a max value of 1,400

meters. If normality is not an issue then I assumed the next step would be to

at least examine the residuals and remove some of those extreme movements. I

added the PLOT=RESIDUALPANEL option to my model with ODS GRAPHICS and plotted

the residuals. The residuals looked very different than what I'd see in a

PROC MIXED model and I was unable to interpret the plots to determine if I

need to remove any outliers. Will I not receive a normal residual plot,

similar to PROC MIXED? If so, how do you interpret residual plots from PROC

GLIMMIX. Thank you very much!

procedure. My dataset contains over 20,000 GPS records. I'm trying to

evaluate why certain deer were observed during hunting season thus I've coded

the deer that were observed with a "1" and those not observed with a "0." I

coded the entire our that the deer was observed to encompass any hunter

recording errors. My model is shown below:

PROC GLIMMIX DATA=OBS METHOD=LAPLACE;

CLASS ID YEAR EXPOSURE HABITAT_VALUE;

MODEL OBSERVED (EVENT = '1') = EXPOSURE STEPLENGTH HABITAT_VALUE ELEVATION

DIST_NEAREST_ROAD / DIST=BINARY LINK=LOGIT SOLUTION;

RANDOM ID YEAR;

RUN;

I want to see if the different independent variables influence the

observation of deer throughout the hunting season. My question is what are

the assumptions that I need to adhere to with logistic regression. I read

that the data does not need to be normally distributed. I know "steplength"

is extremely right skewed with the mean of 48 meters and a max value of 1,400

meters. If normality is not an issue then I assumed the next step would be to

at least examine the residuals and remove some of those extreme movements. I

added the PLOT=RESIDUALPANEL option to my model with ODS GRAPHICS and plotted

the residuals. The residuals looked very different than what I'd see in a

PROC MIXED model and I was unable to interpret the plots to determine if I

need to remove any outliers. Will I not receive a normal residual plot,

similar to PROC MIXED? If so, how do you interpret residual plots from PROC

GLIMMIX. Thank you very much!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Buck1480

03-09-2011 11:07 AM

Andy,

You need to repost this in the Statistical forum. There are readers there that may be able to help.

My first thought is that a residual in a logistic regression is going to be bounded on the probability scale, so you probably want to plot using something like:

proc glimmix plots=(ResidualPanel(marginal)

ResidualPanel(unpack conditional);

This will give the residuals both using the random effect predictors (conditional) and averaging over the random effects (marginal). I don't know if influence statistics (Cook's D, DFFITS) are available for GLIMMIX.

I have one question about the variable ID--does it refer to an individual deer, and if so are there repeated observations on that animal? Then some spatial modeling might be in order as well, or grouping variances by animal, or, well, a whole bundle of things, but probably not relevant to your question about the plots.

Good luck,

SteveDenham

You need to repost this in the Statistical forum. There are readers there that may be able to help.

My first thought is that a residual in a logistic regression is going to be bounded on the probability scale, so you probably want to plot using something like:

proc glimmix plots=(ResidualPanel(marginal)

ResidualPanel(unpack conditional);

This will give the residuals both using the random effect predictors (conditional) and averaging over the random effects (marginal). I don't know if influence statistics (Cook's D, DFFITS) are available for GLIMMIX.

I have one question about the variable ID--does it refer to an individual deer, and if so are there repeated observations on that animal? Then some spatial modeling might be in order as well, or grouping variances by animal, or, well, a whole bundle of things, but probably not relevant to your question about the plots.

Good luck,

SteveDenham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Buck1480

03-09-2011 11:17 AM

Steve,

Yes, ID refers to an individual deer. I tried running the model with different covariance structures such as: VC (default), CS, AR(1), and UN. The default covariance structure (VC) provided me with the best fit model based on AICc. I've tried running the spatial power covariance structure in MIXED when I was analyzing movement data but would receive an error message stating that it stopped because of an infinite likelihood. I determined that the error was due having multiple lines of data for one indvidual deer. Unfortunately, I wasn't sure how to overcome this and was told by a statistician to use another covariance structure. Thank you for your help!

Yes, ID refers to an individual deer. I tried running the model with different covariance structures such as: VC (default), CS, AR(1), and UN. The default covariance structure (VC) provided me with the best fit model based on AICc. I've tried running the spatial power covariance structure in MIXED when I was analyzing movement data but would receive an error message stating that it stopped because of an infinite likelihood. I determined that the error was due having multiple lines of data for one indvidual deer. Unfortunately, I wasn't sure how to overcome this and was told by a statistician to use another covariance structure. Thank you for your help!

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Buck1480

03-10-2011 06:50 AM

Aha! The infinite likelihood caused by multiple lines per subject problem.

You can fix this by respecifying the subject, so that instead of subject=ID, you use subject=ID*. Something makes each line unique, and it should be included on the CLASS statement. A good guess would be one of the fixed effects, say exposure (just a guess, not sure at all). If that is the situation then subject=ID*EXPOSURE might fix the infinite likelihood. It might get more complex to the point that subject=ID*EXPOSURE*HABITAT_VALUE may be needed.

This still doesn't address the residual plot problem. I keep hoping someone will drop a hint in here.

SteveDenham

You can fix this by respecifying the subject, so that instead of subject=ID, you use subject=ID*

This still doesn't address the residual plot problem. I keep hoping someone will drop a hint in here.

SteveDenham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

03-10-2011 11:29 AM

As indicated elsewhere, you must have multiple observations for each ID (individual) if you have ID as a random effect. A statement such as plots=residualpanel should give you four graphs, including a 'normal' quantile plot (residual vs. quantile on a normal scale), residual vs. linear predictor (which is the estimate logit here), a histogram of residuals, and a boxplot. I prefer that you use plots=studentpanel to get the studentized residuals (actually conditional studentized residuals here). Easier to spot outliers.

Since your response is binary (0/1), these diagnostic plots are challenging. Although you can do all the standard residual plots, but as stated by David Collett in Modelling Binary Data, "some of them become difficult to interpret." You can get strange looking residual plots. The Collett book has an excellent chapter on GLM diagnostics, although he does not deal with random effects (in that chapter).

GLIMMIX does not (yet) have formal influence diagnostics (as found in MIXED).

Since your response is binary (0/1), these diagnostic plots are challenging. Although you can do all the standard residual plots, but as stated by David Collett in Modelling Binary Data, "some of them become difficult to interpret." You can get strange looking residual plots. The Collett book has an excellent chapter on GLM diagnostics, although he does not deal with random effects (in that chapter).

GLIMMIX does not (yet) have formal influence diagnostics (as found in MIXED).