BookmarkSubscribeRSS Feed
patelam9
Calcite | Level 5

Hello,

I am using PROC GLIMMIX for the first time and do not have any expertise in spatial analyses. I have Zip Code level data with positive spatial autocorrelation. I am interested in running a regression model to examine the association of some area level predictors with the number of people with the illness of interest in an area. I am using PROC GLIMMIX to account for clustering at Zip Code level and for spatial autocorrelation. When I use the following statements as Step 1 (to simply account for clustering at Zip Code level), PROC GLIMMIX model converges and I get an output that makes practical sense however with overdispersion (the model does not converge when I use ‘random _residual_;’ statement for overdispersion, or any other DDFM method).

 

Count is the number of people with the illness. All predictors are continuous variables. White, black, adults, older_adults   are proportions. Offset is the log of population. There are about 12,000 observations at ZC level in the ZC1 dataset.

 

Step 1:

Proc Glimmix data =  library.ZC1  NOCLPRINT ;

class  ZC;

model  Count =  white  black   adults  older_adults   averageHHsize/  dist = poisson offset = lnPop solution  DDFM=BW;             

random intercept / Subject = ZC ;

run

 

In Step 2, I run the same statements but this time adding the ZC centroid information (X and Y represent the latitude and longitude of the ZC). Model converges once again but I get an output exactly the same as in Step 1 and still have overdispersion. Here are the SAS statements from Step 2. Can someone please tell me what am I doing wrong here? Thanks a lot.

 

Step 2:

Proc Glimmix data =  library.ZC1  NOCLPRINT ;

class  ZC;

model  Count =  white  black   adults  older_adults   averageHHsize/  dist = poisson offset = lnPop solution  DDFM=BW;             

random intercept / Subject = ZC type=sp(exp)(X Y);

run

 

5 REPLIES 5
SteveDenham
Jade | Level 19

Try running these as conditional models, using method=laplace in the PROC GLIMMIX statement.  Just a hunch.  Also, you may want to arbitrarily 'center' your coordinates, so that you don't have extreme values.  Lastly, I would almost certainly suspect that the spatial correlation is anisotropic.  If the anisotropic exponential is available in GLIMMIX, you might want to try that. I know it is available in PROC MIXED, but the only anisotropic structure I see in the GLIMMIX documentation is anisotropic spatial power.

 

Oh yeah, if latitude and longitude are decimal representations, you should be good to go; otherwise you may have to convert to actual distances in the X and Y plane.  If the locations are substantially separated, so that at higher latitude, the same difference in longitude values would lead to a major change in the physical distance between the centroids, you really might need to change to distances.

 

Steve Denham

patelam9
Calcite | Level 5

Thanks a lot Steve for taking time out and responding. Firstly, I tried to look up SAS GLIMMIX documentation for anisotropic spatial correlation and I don’t think PROC GLIMMIX supports it. Secondly, I checked my data and looks like X (longitude degrees of the center of ZC) and Y (latitude degrees of the center of ZC) are decimal representations. Lastly, thanks for suggesting method=laplace. When I try to run the model with method=laplace, I get a SAS warning that “Obtaining minimum variance quadratic unbiased estimates as starting values for the covariance parameters failed. So I added ‘parms (0.4) (0.4)/ noiter’ and the model converges and I get an output that looks right.

 

I just have two more questions – I specified parms (0.4) since the initial GLIMMIX model (that simple accounted for clustering at ZC level) showed a Covariance Parameter Estimate of 0.4. Is this a right approach?

And with the following codes, I get no estimate for standard error of the spatial covariance parameter. Is this common or is my code wrong?

 

Proc Glimmix data =  library.ZC1  NOCLPRINT METHOD=LAPLACE ;

class  ZC;

model  Count =  white  black   adults  older_adults   averageHHsize/  dist = poisson offset = lnPop solution  DDFM=BW;             

random intercept / Subject = ZC type=sp(exp)(X Y);

parms (0.4) (0.4)/ noiter;

run

 

Covariance Parameter Estimates

Cov Parm

  Subject

   Estimate

SE

Variance

   ZC

0.4000

     0.01039

SP(EXP)

    ZC

0.4000

.

 

 

Thanks a lot!

SteveDenham
Jade | Level 19

With the noiter option, the variance components are fixed and should have no standard error.  What happens if you remove this, and just specify the starting values?

 

Steve Denham

patelam9
Calcite | Level 5

Thanks for your response Steve. I did just that. I ran the model after removing noiter as below.

 

The output now shows Cov Parameter Estimate for ZC (estimate = 0.2970, SE = 0.007635) but the estimate for SP(EXP) is fixed at 0.1 and has no SE. Is that a correct output or I should be getting an estimate and SE for SP(EXP) as well?

 

Also overdispersion (Pearson Chi-Square / DF) in this model is 1.593E22 which is unusual. Is this common with conditional distribution (since I used METHOD = LAPLACE)? Thanks a lot!

 

Proc Glimmix data =  library.ZC1  NOCLPRINT METHOD=LAPLACE ;

class  ZC;

model  Count =  white  black   adults  older_adults   averageHHsize/  dist = poisson offset = lnPop solution  DDFM=BW;             

random intercept / Subject = ZC type=sp(exp)(X Y);

parms (0.1) (0.1);

run

 

Covariance Parameter Estimates

Cov Parm

Subject

Estimate

Standard Error

Z Value

Pr > Z

Variance

ZC

0.2968

0.007630

38.90

<.0001

SP(EXP)

ZC

0.1000

0

.

.

 

SteveDenham
Jade | Level 19

This really looks like "nothing happened".  The spatial correlation is still at 0.1, and that makes me think that the values of X and Y are such that the Euclidean distance between observations is huge, and consequently, no correlation is fit.  Ridging might help here, so try adding:

 

NLOPTIONS tech=nrridg;

 

as a line.

 

Also, the documentation says c-list has the names of the numeric variables (so check that X and Y aren't somehow still character values), for the ith vector and the jth vector, which correspond to the ith and jth observations.  That implies a dependence on the sort order of the observations, so it would probably be good to presort the data on X and Y.

 

And last, "The practical range of a (second-order stationary) spatial process is the distance at whcih the correlations fall below 0.05.  For the SP(EXP) structure this disance is 3*alpha." This alpha is the denominator in the covariance equation, so maybe by setting the initial guess to 0.1, it gets stuck.  You might try (hopefully I have these in the correct order):

 

parms (0.3) (0.1 to 0.9 by 0.1);

 

This should lead to a grid search across various correlation values, and maybe from there something will get "unstuck".

 

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2861 views
  • 1 like
  • 2 in conversation