Solved: Re: GLIMMIX spatial autocorrelation, syntax

tdlang · Posted 12-10-2013 06:30 PM

Hello all,

First question I have posted, so please let me know if addition information is needed beyond that provided below:

I am looking at a dataset of fish density values, recorded over 3 years at 3 different locations.

These density values are obtained during surveys that are composed of multiple transect lines at a single site
Transect "curtains" are further divided into grids (distance x depth), of which each cell is assigned a density and coordinate
Additional parameters (Bottom habitat, salinity, temperature, etc) are assigned to the cells
The density values have a very very skewed distribution (many small, few very large), hence the use of GLIMMIX to attempt to address this.

I am interested in the effects of these "additional parameters" on density, though I cannot ignore the spatial nature of the data (fish group together, nearby cells will tend to have more similar densities).

I am assuming that spatial correlations would only exist within a single survey.

My model currently looks like this:

proc glimmix data=Data.BankPresent pconv =1e-5 maxopt=30;

class Site Habitat Season Survey;

model Density= Relief DShallow MeanSalinity MeanTemp Site|Habitat|Season

/ dist=gamma link=log;

random Survey / subject=intercept type=sp(exp) (Easting Northing);

run;

I am unsure of where to put the spatial autocorrelation component [type=sp(exp) (Easting Northing)], or if this model is doing what I have attempted to describe above.

If needed/of interest, the variables included are:

Relief: continuous, vertical relief of the seafloor below cell i
DShallow: continuous, distance from cell i to the shallowest point at the site
MeanSalinity: continuous, salinity of cell i
MeanTemp: continuous, temperature of cell i
Site: categorical, site at which the survey was conducted (1,2,3)
Habitat: categorical, habitat classification of seafloor below cell i (1,2,3,4,5)
Season: categorical, season in which the survey was conducted (1,2,3,4)

To reiterate, I want to run the above model with a consideration for spatial autocorrelation on a survey-by-survey basis.

Thank you for any information you may be able to provide, applicable references, or questions of clarification.

SteveDenham · Posted 12-11-2013 11:11 AM

Very close. Rearrange the random statement to:

random intercept/subject=survey type=sp(exp)(easting northing);

And I think you will be able to fit the model (of course, this supposes sufficient data...). This does kind of depend on the easting, northing values being unique within survey. If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).

Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.

Steve Denham

View solution in original post

SteveDenham · Posted 12-11-2013 11:11 AM

Very close. Rearrange the random statement to:

random intercept/subject=survey type=sp(exp)(easting northing);

And I think you will be able to fit the model (of course, this supposes sufficient data...). This does kind of depend on the easting, northing values being unique within survey. If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).

Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.

Steve Denham

tdlang · Posted 12-11-2013 12:57 PM

Thank you very much Steve.

I gave that a shot and it does run, however in the resulting "Covariance Parameter Estimates," the spatial term comes back with an estimate of 1.000 and no standard error.

Cov Parm Subject Estimate Standard Error

Variance Name 0.1279 0.04371

SP(EXP) Name 1.0000 .

Residual 8.4454 0.06207

This yields a model that is no different from one that was run without the type= option included.

Could this be due to the easting/northing duplication you mentioned?

It is possible that there is duplication of easting/northing pairs:

Within a Survey (since grid cells are in multiple layers, stacked cells would receive the same easting/northing)
- Can the type=sp option handle three dimensions? I think that would solve this problem.
Between surveys (since surveys were conducted in the same locations multiple times)

I appreciate the response very much, one step closer!

SteveDenham · Posted 12-11-2013 02:55 PM

I would think the duplication may well be the problem--you are seeing perfect correlation within each survey, which may not be surprising if we are dealing with a marginal over all the fixed effects. I have not seen three dimensional spatial covariances, and I don't think GLIMMIX allows them. What might work is if you included layer in the model, and then would try subject=survey group=layer in the random statement. That should fit separate 2 dimensional errors for each layer by survey combination. Hopefully, your data are rich enough in both detail and N to fit such a model.

Steve Denham

tdlang · Posted 12-12-2013 12:25 PM

Thanks again Steve.

I have given this a run as well, looks to increase computing time dramatically so I am still waiting on output (11hrs and counting!) but I will report back once it finishes.

From reading about the options again, this seems to make sense.

I appreciate your help putting these things in a context I can understand.

SteveDenham · Posted 12-12-2013 12:53 PM

Think about how many more covariance parameters you are now fitting simultaneously. CPU time for this goes up on the order of n log n, I think.

Steve Denham

tdlang · Posted 12-16-2013 05:28 PM

Not much progress here. Even after including the group=layer option, the sp(exp) term for each group is still 1.

I double checked that there are no duplicated coordinates within the Name*Layer combinations.

As for data richness in N, each Name*Layer combination has at least 1 observation (usually many, many more, dataset is 37,000 observations).

I can see how those groups with a single observation would cause a problem, but I would think that since I have tried to "partition" the calculations, the problem would be confined to only those groups with a single observation. I may be mistaken, as I don't know exactly how SAS is computing this giant covariance matrix.

Even when I restrict the analysis to those Name*Layer combinations with at least 3 observations, I have the same problem.

Besides this, I cannot think of anything else that might be causing the problem. I am open to suggestions and, as always, appreciative of your comments.

SteveDenham · Posted 12-17-2013 03:04 PM

Have you tried other spatial autocorrelation types? Anisotropic spatial power, maybe? You might also want to look at Radial Smoothing Based on Mixed Models in the documentation for a multidimensional approach.

This is the part where I grasp at straws.

Steve Denham

Registration is open

SAS Training: Just a Click Away