- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello all,
First question I have posted, so please let me know if addition information is needed beyond that provided below:
I am looking at a dataset of fish density values, recorded over 3 years at 3 different locations.
- These density values are obtained during surveys that are composed of multiple transect lines at a single site
- Transect "curtains" are further divided into grids (distance x depth), of which each cell is assigned a density and coordinate
- Additional parameters (Bottom habitat, salinity, temperature, etc) are assigned to the cells
- The density values have a very very skewed distribution (many small, few very large), hence the use of GLIMMIX to attempt to address this.
I am interested in the effects of these "additional parameters" on density, though I cannot ignore the spatial nature of the data (fish group together, nearby cells will tend to have more similar densities).
I am assuming that spatial correlations would only exist within a single survey.
My model currently looks like this:
proc glimmix data=Data.BankPresent pconv =1e-5 maxopt=30;
class Site Habitat Season Survey;
model Density= Relief DShallow MeanSalinity MeanTemp Site|Habitat|Season
/ dist=gamma link=log;
random Survey / subject=intercept type=sp(exp) (Easting Northing);
run;
I am unsure of where to put the spatial autocorrelation component [type=sp(exp) (Easting Northing)], or if this model is doing what I have attempted to describe above.
If needed/of interest, the variables included are:
- Relief: continuous, vertical relief of the seafloor below cell i
- DShallow: continuous, distance from cell i to the shallowest point at the site
- MeanSalinity: continuous, salinity of cell i
- MeanTemp: continuous, temperature of cell i
- Site: categorical, site at which the survey was conducted (1,2,3)
- Habitat: categorical, habitat classification of seafloor below cell i (1,2,3,4,5)
- Season: categorical, season in which the survey was conducted (1,2,3,4)
To reiterate, I want to run the above model with a consideration for spatial autocorrelation on a survey-by-survey basis.
Thank you for any information you may be able to provide, applicable references, or questions of clarification.
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Very close. Rearrange the random statement to:
random intercept/subject=survey type=sp(exp)(easting northing);
And I think you will be able to fit the model (of course, this supposes sufficient data...). This does kind of depend on the easting, northing values being unique within survey. If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).
Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Very close. Rearrange the random statement to:
random intercept/subject=survey type=sp(exp)(easting northing);
And I think you will be able to fit the model (of course, this supposes sufficient data...). This does kind of depend on the easting, northing values being unique within survey. If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).
Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you very much Steve.
I gave that a shot and it does run, however in the resulting "Covariance Parameter Estimates," the spatial term comes back with an estimate of 1.000 and no standard error.
Cov Parm Subject Estimate Standard Error
Variance Name 0.1279 0.04371
SP(EXP) Name 1.0000 .
Residual 8.4454 0.06207
This yields a model that is no different from one that was run without the type= option included.
Could this be due to the easting/northing duplication you mentioned?
It is possible that there is duplication of easting/northing pairs:
- Within a Survey (since grid cells are in multiple layers, stacked cells would receive the same easting/northing)
- Can the type=sp option handle three dimensions? I think that would solve this problem.
- Between surveys (since surveys were conducted in the same locations multiple times)
I appreciate the response very much, one step closer!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I would think the duplication may well be the problem--you are seeing perfect correlation within each survey, which may not be surprising if we are dealing with a marginal over all the fixed effects. I have not seen three dimensional spatial covariances, and I don't think GLIMMIX allows them. What might work is if you included layer in the model, and then would try subject=survey group=layer in the random statement. That should fit separate 2 dimensional errors for each layer by survey combination. Hopefully, your data are rich enough in both detail and N to fit such a model.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks again Steve.
I have given this a run as well, looks to increase computing time dramatically so I am still waiting on output (11hrs and counting!) but I will report back once it finishes.
From reading about the options again, this seems to make sense.
I appreciate your help putting these things in a context I can understand.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Think about how many more covariance parameters you are now fitting simultaneously. CPU time for this goes up on the order of n log n, I think.
Steve Denham
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Not much progress here. Even after including the group=layer option, the sp(exp) term for each group is still 1.
I double checked that there are no duplicated coordinates within the Name*Layer combinations.
As for data richness in N, each Name*Layer combination has at least 1 observation (usually many, many more, dataset is 37,000 observations).
I can see how those groups with a single observation would cause a problem, but I would think that since I have tried to "partition" the calculations, the problem would be confined to only those groups with a single observation. I may be mistaken, as I don't know exactly how SAS is computing this giant covariance matrix.
Even when I restrict the analysis to those Name*Layer combinations with at least 3 observations, I have the same problem.
Besides this, I cannot think of anything else that might be causing the problem. I am open to suggestions and, as always, appreciative of your comments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried other spatial autocorrelation types? Anisotropic spatial power, maybe? You might also want to look at Radial Smoothing Based on Mixed Models in the documentation for a multidimensional approach.
This is the part where I grasp at straws.
Steve Denham