06-28-2016 11:06 PM
Hello,
I am using PROC GLIMMIX for the first time and do not have any expertise in spatial analyses. I have Zip Code level data with positive spatial autocorrelation. I am interested in running a regression model to examine the association of some area level predictors with the number of people with the illness of interest in an area. I am using PROC GLIMMIX to account for clustering at Zip Code level and for spatial autocorrelation. When I use the following statements as Step 1 (to simply account for clustering at Zip Code level), PROC GLIMMIX model converges and I get an output that makes practical sense however with overdispersion (the model does not converge when I use ‘random _residual_;’ statement for overdispersion, or any other DDFM method).
Count is the number of people with the illness. All predictors are continuous variables. White, black, adults, older_adults are proportions. Offset is the log of population. There are about 12,000 observations at ZC level in the ZC1 dataset.
Step 1:
Proc Glimmix data = library.ZC1 NOCLPRINT ;
class ZC;
model Count = white black adults older_adults averageHHsize/ dist = poisson offset = lnPop solution DDFM=BW;
random intercept / Subject = ZC ;
run;
In Step 2, I run the same statements but this time adding the ZC centroid information (X and Y represent the latitude and longitude of the ZC). Model converges once again but I get an output exactly the same as in Step 1 and still have overdispersion. Here are the SAS statements from Step 2. Can someone please tell me what am I doing wrong here? Thanks a lot.
Step 2:
Proc Glimmix data = library.ZC1 NOCLPRINT ;
class ZC;
model Count = white black adults older_adults averageHHsize/ dist = poisson offset = lnPop solution DDFM=BW;
random intercept / Subject = ZC type=sp(exp)(X Y);
run;
06-29-2016 03:14 PM
Try running these as conditional models, using method=laplace in the PROC GLIMMIX statement. Just a hunch. Also, you may want to arbitrarily 'center' your coordinates, so that you don't have extreme values. Lastly, I would almost certainly suspect that the spatial correlation is anisotropic. If the anisotropic exponential is available in GLIMMIX, you might want to try that. I know it is available in PROC MIXED, but the only anisotropic structure I see in the GLIMMIX documentation is anisotropic spatial power.
Oh yeah, if latitude and longitude are decimal representations, you should be good to go; otherwise you may have to convert to actual distances in the X and Y plane. If the locations are substantially separated, so that at higher latitude, the same difference in longitude values would lead to a major change in the physical distance between the centroids, you really might need to change to distances.
Steve Denham
06-29-2016 11:34 PM
Thanks a lot Steve for taking time out and responding. Firstly, I tried to look up SAS GLIMMIX documentation for anisotropic spatial correlation and I don’t think PROC GLIMMIX supports it. Secondly, I checked my data and looks like X (longitude degrees of the center of ZC) and Y (latitude degrees of the center of ZC) are decimal representations. Lastly, thanks for suggesting method=laplace. When I try to run the model with method=laplace, I get a SAS warning that “Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed. So I added ‘parms (0.4) (0.4)/ noiter’ and the model converges and I get an output that looks right.
I just have two more questions – I specified parms (0.4) since the initial GLIMMIX model (that simple accounted for clustering at ZC level) showed a Covariance Parameter Estimate of 0.4. Is this a right approach?
And with the following codes, I get no estimate for standard error of the spatial covariance parameter. Is this common or is my code wrong?
Proc Glimmix data = library.ZC1 NOCLPRINT METHOD=LAPLACE ;
class ZC;
model Count = white black adults older_adults averageHHsize/ dist = poisson offset = lnPop solution DDFM=BW;
random intercept / Subject = ZC type=sp(exp)(X Y);
parms (0.4) (0.4)/ noiter;
run;
Covariance Parameter Estimates | |||
Cov Parm | Subject | Estimate | SE |
Variance | ZC | 0.4000 | 0.01039 |
SP(EXP) | ZC | 0.4000 | . |
Thanks a lot!
07-01-2016 01:50 PM
With the noiter option, the variance components are fixed and should have no standard error. What happens if you remove this, and just specify the starting values?
Steve Denham
07-01-2016 02:09 PM
Thanks for your response Steve. I did just that. I ran the model after removing noiter as below.
The output now shows Cov Parameter Estimate for ZC (estimate = 0.2970, SE = 0.007635) but the estimate for SP(EXP) is fixed at 0.1 and has no SE. Is that a correct output or I should be getting an estimate and SE for SP(EXP) as well?
Also overdispersion (Pearson Chi-Square / DF) in this model is 1.593E22 which is unusual. Is this common with conditional distribution (since I used METHOD = LAPLACE)? Thanks a lot!
Proc Glimmix data = library.ZC1 NOCLPRINT METHOD=LAPLACE ;
class ZC;
model Count = white black adults older_adults averageHHsize/ dist = poisson offset = lnPop solution DDFM=BW;
random intercept / Subject = ZC type=sp(exp)(X Y);
parms (0.1) (0.1);
run;
Covariance Parameter Estimates | |||||
Cov Parm | Subject | Estimate | Standard Error | Z Value | Pr > Z |
Variance | ZC | 0.2968 | 0.007630 | 38.90 | <.0001 |
SP(EXP) | ZC | 0.1000 | 0 | . | . |
07-11-2016 12:49 PM
This really looks like "nothing happened". The spatial correlation is still at 0.1, and that makes me think that the values of X and Y are such that the Euclidean distance between observations is huge, and consequently, no correlation is fit. Ridging might help here, so try adding:
NLOPTIONS tech=nrridg;
as a line.
Also, the documentation says c-list has the names of the numeric variables (so check that X and Y aren't somehow still character values), for the ith vector and the jth vector, which correspond to the ith and jth observations. That implies a dependence on the sort order of the observations, so it would probably be good to presort the data on X and Y.
And last, "The practical range of a (second-order stationary) spatial process is the distance at whcih the correlations fall below 0.05. For the SP(EXP) structure this disance is 3*alpha." This alpha is the denominator in the covariance equation, so maybe by setting the initial guess to 0.1, it gets stuck. You might try (hopefully I have these in the correct order):
parms (0.3) (0.1 to 0.9 by 0.1);
This should lead to a grid search across various correlation values, and maybe from there something will get "unstuck".
Steve Denham